General Relativity for the Gifted Amateur

Tom LancasterDurham University

Stephen J. Blundell
University of Oxford

CNFGFD
INIVERSITY PRESS

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries
© Tom Lancaster and Stephen Blundell, 2025
The moral rights of the authors have been asserted
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, transmitted, used for text and data mining, or used for training artificial intelligence, in any form or transmitted, used for text and data mining, or used for training artificial intelligence, in any form or
by any means, without the prior permission in writing of Oxford University Press, or as expressly by any means, without the prior permission in writing of Oxford University Press, or as expressl
permitted by law, by licence or under terms agreed with the appropriate reprographics rights
permitted by law, by licence or under terms agreed with the appropriate reprographics rights
organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2024946925
ISBN 9780192867407
ISBN 9780192867414 (pbk.)
DOI: 10.1093/oso/9780192867407.001.0001
Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4 YY
Cover image: The authors
Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

Preface

I saw Eternity the other night
Like a great Ring of pure and endless light, All calm as it was bright,
And round beneath it, Time in hours, days, years Driv'n by the spheres
Like a vast shadow mov'd. In which the world
And all her train were hurl'd.
Henry Vaughan (1621-1695) The World
You sometimes speak of gravity as essential and inherent to matter. Pray do not ascribe that notion to me, for the cause of gravity is what I do not pretend to know and therefore would take more time to consider of it.
Sir Isaac Newton (1642-1726) Letter to Richard Bentley
Albert Einstein's crowning theoretical achievement was his formulation of his general theory of relativity, a theory of gravity that superseded Isaac Newton's approach and transformed our view of the Universe. It took a century for one of its key predictions, the existence of gravitational waves, to be verified, but it is a measure of how persuasive its underlying principles have been that no-one seriously doubted that gravitational waves would eventually be detected. General relativity engages profoundly with the nature of space and time (even more so than Henry Vaughan did in his magnificent poem quoted above) and provides the key ideas missing from Newton's theory (the deficiency of which Newton was keenly aware, as indicated in the second quotation). Our point of view in writing this book is that everyone should have the opportunity to engage with this beautiful theory which, conceptually, is based on simple ideas from the physics of fields. The mastery of the machinery of general relativity does however require some facility with mathematics that is likely to be unfamiliar to many students of physics, but the payoff is so great and the material so stimulating that we hope the reader will join us in exploring one of the greatest achievements in physics.
The text follows the same approach as our earlier Quantum Field Theory for the Gifted Amateur (QFTGA) (OUP, 2014) and this is perhaps a good moment to restate what we mean by the slightly tongue-incheek term 'gifted amateur'. We are not writing for the mathematically uninitiated and do assume that our reader has a background in physics. However, we are not writing for experts either and aim to provide an entry point to a profound topic that we hope readers will find both entertaining and useful. The use of the term 'gifted amateur' encourages
the potential reader to have a go for themselves, conveying the feeling that a difficult subject is open to those who considered themselves nonexperts. We adopt the same approach used in writing QFTGA, dividing the material into short and easily digestible chapters, spelling out mathematical steps in worked examples and illustrating the arguments with hand-drawn figures. In response to reader requests from QFTGA, we also include a large number of problems with worked solutions. We have both had a long-standing fascination with the subject and a conviction that it belongs more centrally in the physics curriculum. However, we have not lost the memory of finding some of this material difficult and so hope that our book will give a curious reader a more patient and illuminating guide to this subject that they would find in some of the more weighty and established tomes.
The first two parts of the book introduce the main concepts that lead to the formulation of the Einstein field equations, and this material concludes with an outline description of the most important implications of the theory. These implications are worked out in much more detail in the middle section of the book, the third part covering cosmology and the fourth part detailing the consequences for orbits and black holes. General relativity is a theory about the geometry of spacetime and a more mathematical treatment of geometry is given in the fifth part of the book for those with an appetite to explore these aspects in more detail. The final part of the book returns to field theory, framing general relativity as a classical field theory and looking forward to how it might be formulated as a quantum field theory in a future theory of quantum gravity.
In writing this volume we are particularly grateful to the following individuals who have helped us: Rodrigo Alonso, Nathan Bentley, Katherine Blundell, Theo Breeze, Harvey Brown, Andrei Constantin, Felix Flicker, Martin Galpin, Matjaž Gomilšek, Thomas Hicken, Ben Huddart, Ifan Hughes, Baojiu Li, Guillaume Mahler, and Trevor Wishart. These individuals have been generous with their time and have helped improve the book, but any errors that are found post-publication will be posted on the book's website:
http://tomlancaster.webspace.durham.ac.uk/grgabook
We are also grateful to our copy editor Aravind Kannankara. Finally, we thank Cally, Eden, and Katherine for their patience, love, and support.
TL & SJB

Contents

0 Overture1
0.1 What is relativity?1
0.2 What is general relativity?3
0.3 What is a metric? ..... 5
0.4 What are we building on? ..... 6
0.5 Who is this book for?8
0.6 Units in this book
10
Exercises
11
I Geometry and mechanics in flat spacetime
1 Special relativity ..... 12
1.1 A common sense start ..... 12
1.2 The speed of light ..... 13
1.3 Light cones and the Lorentz transformation ..... 15
1.4 Paths through spacetime ..... 18
1.5 Experiments ..... 19
Exercises ..... 21
2 Vectors in flat spacetime ..... 22
2.1 Vectors ..... 23
2.2 Coordinate transformations ..... 24
2.3 Examples of vectors ..... 27
2.4 Principle of least action
34
Exercises ..... 34
3 Coordinates ..... 36
3.1 Coordinates in Euclidean space ..... 36
3.2 Farewell to the position vector ..... 39
3.3 Non-Euclidean space ..... 40
Exercises ..... 41
4 Linear slot machines ..... 43
4.1 Dot products and down vectors ..... 44
4.2 Vectors and 1-forms ..... 46
4.3 Transformations ..... 49
4.4 Tensors ..... 50
4.5 Energy-momentum tensor ..... 52
Exercises ..... 55
5 The metric ..... 56
5.1 Metrics in general ..... 56
5.2 Meet some metrics ..... 58
5.3 Light and light cones ..... 60
5.4 Lengths, areas, volumes ..... 62
Exercises ..... 65
II Curvature and general relativity ..... 67
6 Finding a theory of gravitation ..... 68
6.1 Free fall and the equivalence principle ..... 68
6.2 Why general relativity? ..... 73
6.3 A differential equation to describe gravity ..... 75
6.4 Local flatness ..... 76
6.5 Time dilation in a gravitational field ..... 77
Exercises ..... 79
7 Parallel lines and the covariant derivative ..... 81
7.1 Parallelism ..... 82
7.2 Derivatives and connections ..... 83
7.3 The covariant derivative ..... 85
7.4 Parametrized paths ..... 86
7.5 Enter the metric ..... 88
Exercises ..... 89
8 Free fall and geodesics ..... 91
8.1 Extremal intervals ..... 91
8.2 A geodesic equation ..... 95
8.3 Inertial forces ..... 96
8.4 Geodesics for photons ..... 98
Exercises ..... 99
9 Geodesic equations and connection coefficients ..... 101
9.1 Finding connection coefficients ..... 101
9.2 The geodesic equation from the action ..... 104
Exercises ..... 105
10 Making measurements in relativity ..... 108
10.1 Observers and their observations ..... 108
10.2 Coordinate and non-coordinate bases ..... 110
10.3 The orthonormal frame ..... 114
10.4 Freely falling frames ..... 116
Exercises ..... 118
11 Riemann curvature and the Ricci tensor ..... 120
11.1 What is curvature? ..... 120
11.2 Tidal forces ..... 121
11.3 Riemann curvature ..... 124
11.4 Symmetries of the Riemann tensor ..... 126
11.5 The Ricci tensor and Ricci scalar ..... 127
11.6 Example computations ..... 128
11.7 Geodesic deviation revisited ..... 129
Exercises ..... 130
12 The energy-momentum tensor ..... 131
12.1 Another look at the energy-momentum tensor ..... 131
12.2 Example energy-momentum tensors ..... 132
12.3 Classical particles ..... 134
12.4 Conservation laws ..... 136
Exercises ..... 140
13 The gravitational field equations ..... 141
13.1 Geometry: a recap of the key ingredients ..... 141
13.2 Physics: the key ingredients ..... 142
13.3 An incorrect guess ..... 145
13.4 Einstein's field equation ..... 147
Exercises ..... 150
14 The triumphs of general relativity ..... 151
14.1 Weak fields and the Newtonian limit ..... 151
14.2 Gravitational waves ..... 153
14.3 Stars, trajectories, and orbits ..... 155
14.4 Cosmology ..... 156
III Cosmology ..... 157
15 An introduction to cosmology ..... 158
15.1 The cosmological principle ..... 159
15.2 The Hubble flow ..... 160
15.3 Cosmic time ..... 162
15.4 Universe 0 : an empty universe ..... 163
15.5 Universe 1: flat and expanding ..... 164
Exercises ..... 167
16 Robertson-Walker spaces ..... 169
16.1 Spaces with constant curvature ..... 169
16.2 Three Robertson-Walker spaces ..... 173
16.3 Redshift and cosmic expansion ..... 176
16.4 The initial singularity ..... 178
Exercises ..... 179
17 The Friedmann equations ..... 181
17.1 Enter energy-momentum ..... 182
17.2 Enter thermodynamics ..... 183
17.3 Dust and radiation ..... 184
Exercises ..... 187
18 Universes of the past and future ..... 188
18.1 Spatially flat universes ..... 188
18.2 Curved universes with Λ = 0 Λ = 0 Lambda=0\Lambda=0Λ=0 ..... 191
18.3 Einstein, Lemaître and Eddington ..... 192
18.4 A brief history of model universes ..... 196
Exercises ..... 199
19 Causality, infinity, and horizons ..... 201
19.1 Penrose diagrams ..... 202
19.2 The de Sitter spacetime ..... 209
19.3 Big-Bang singularities ..... 211
Exercises ..... 214
IV Orbits, stars, and black holes ..... 217
20 Newtonian orbits ..... 218
20.1 Kepler's laws ..... 219
20.2 Anatomy of an orbit ..... 220
20.3 Effective potentials ..... 222
20.4 Allowed trajectories ..... 223
20.5 The why? of orbits ..... 225
Exercises ..... 227
21 The Schwarzschild geometry ..... 229
21.1 Justifying the solution ..... 230
21.2 Components of the Riemann tensor ..... 231
21.3 A gravitating object ..... 232
21.4 The meaning of the coordinates ..... 234
Exercises ..... 235
22 Motion in the Schwarzschild geometry ..... 237
22.1 Constants of the motion ..... 238
22.2 Gravitational redshift ..... 239
22.3 Motion in Schwarzschild spacetime ..... 240
22.4 Example: the radial plunge ..... 242
Exercises ..... 244
23 Orbits in the Schwarzschild geometry ..... 246
23.1 Orbits for massive particles ..... 246
23.2 Stable circular orbits ..... 248
23.3 Precession of the perihelion ..... 249
Exercises ..... 252
24 Photons in the Schwarzschild geometry ..... 254
24.1 Photon trajectories ..... 254
24.2 Looking around ..... 258
Exercises ..... 261
25 Black holes ..... 262
25.1 The surface r = 2 M r = 2 M r=2Mr=2 Mr=2M ..... 264
25.2 The tortoise coordinate ..... 265
25.3 Death of an astronaut ..... 266
25.4 Looking around near a black hole ..... 267
25.5 Gravitational collapse ..... 268
Exercises ..... 270
26 Black-hole singularities ..... 272
26.1 Singularities ..... 272
26.2 Eddington-Finkelstein coordinates ..... 275
Exercises ..... 279
27 Kruskal-Szekeres coordinates ..... 280
27.1 Enter the Kruskal metric ..... 280
27.2 Wormholes ..... 284
27.3 Another Penrose diagram ..... 285
Exercises ..... 287
28 Hawking radiation ..... 289
28.1 Hawking radiation ..... 289
28.2 Black-hole thermodynamics ..... 292
Exercises ..... 296
29 Charged and rotating black holes ..... 297
29.1 Charged black holes ..... 297
29.2 Kerr black holes ..... 299
29.3 Interacting with the Kerr geometry ..... 304
Exercises ..... 306
V Geometry ..... 307
30 Classical curvature ..... 308
30.1 Curvature of a line ..... 308
30.2 Curvature with vectors ..... 310
30.3 Two-dimensional surfaces ..... 312
30.4 Gauss' equation ..... 314
30.5 Intrinsic and extrinsic curvature ..... 316
30.6 Riemann's project ..... 318
Exercises ..... 320
31 A reintroduction to geometry ..... 322
31.1 Old notions of vectors and gradients ..... 323
31.2 Vectors and vector fields ..... 324
31.3 Linear slot machines again ..... 327
31.4 Tensors again ..... 329
31.5 Examples of tensor operations ..... 330
Exercises ..... 332
32 Differential forms ..... 334
32.1 2-forms ..... 334
32.2 p-forms ..... 336
32.3 p-vectors ..... 337
Exercises ..... 339
33 Exterior and Lie derivatives ..... 340
33.1 Exterior calculus ..... 340
33.2 Commutators ..... 342
33.3 Lie derivatives of vectors ..... 344
33.4 Lie derivatives of tensors ..... 347
33.5 Killing vectors ..... 348
Exercises ..... 350
34 Geometry of the connection ..... 351
34.1 Covariant derivative in pictures ..... 352
34.2 Connection and exterior derivative ..... 353
34.3 Covariant derivative of tensors ..... 355
34.4 The metric revisited ..... 358
Exercises ..... 361
35 Riemann curvature revisited ..... 363
35.1 Geodesic deviation (slight return) ..... 363
35.2 Components of the curvature tensor ..... 366
35.3 Parallel transport again ..... 368
35.4 The meaning of the Ricci tensor ..... 370
Exercises ..... 372
36 Cartan's method ..... 374
36.1 Connection 1-forms ..... 374
36.2 Two rules ..... 377
36.3 Le repère mobile ..... 379
36.4 Example computations ..... 380
Exercises ..... 385
37 Duality and the volume form ..... 386
37.1 Motivation: 2-forms and flux ..... 386
37.2 Hodge star operation ..... 387
37.3 Volume forms ..... 392
Exercises ..... 395
38 Forms, chains, and Stokes' theorem ..... 397
38.1 Integration ..... 397
38.2 Integrating over forms ..... 400
38.3 Anatomy of an integral ..... 401
38.4 Boundaries and chains ..... 404
38.5 Stokes' theorem ..... 405
Exercises ..... 408
VI Classical and quantum fields ..... 411
39 Fluids as dry water ..... 412
39.1 Euler's equation ..... 413
39.2 Energy and Bernoulli's equation ..... 415
39.3 Energy-momentum tensor ..... 418
39.4 Relativistic fluids ..... 420
Exercises ..... 425
40 Lagrangian field theory ..... 428
40.1 Matter fields ..... 429
40.2 Action and equations of motion ..... 430
40.3 Fields in curved spacetime ..... 433
40.4 Motivating the Einstein equation ..... 434
40.5 Energy-momentum tensor ..... 437
40.6 Noether's theorem ..... 438
40.7 The perfect fluid ..... 440
Exercises ..... 443
41 Inflation ..... 445
41.1 Symmetry breaking ..... 446
41.2 Effective potentials ..... 449
41.3 Why flat? ..... 451
Exercises ..... 452
42 The electromagnetic field ..... 453
42.1 Electric charge in a field ..... 453
42.2 Faraday tensor and Maxwell equations ..... 455
42.3 Gauge freedom ..... 458
42.4 Geometrical electromagnetism ..... 460
Exercises ..... 464
43 Charge conservation and the Bianchi identity ..... 467
43.1 Conserving electric charge ..... 467
43.2 Electromagnetic gauge field ..... 469
43.3 Gravitational curvature ..... 471
Exercises ..... 475
44 Gauge fields ..... 476
44.1 Fibre bundles and gauge invariance ..... 476
44.2 Parallel transport and field strength ..... 480
Exercises ..... 483
45 Weak gravitational fields ..... 485
45.1 The Newtonian limit ..... 485
45.2 Linearized theory of gravitation ..... 487
45.3 Exploiting gauges ..... 488
Exercises ..... 492
46 Gravitational waves ..... 494
46.1 Waves in a gauge theory ..... 494
46.2 Lorenz gauge for gravitational waves ..... 496
46.3 Quadrupolar radiation ..... 501
46.4 Radiated energy and power ..... 503
46.5 An exact solution ..... 505
46.6 The discovery of gravitational waves ..... 506
Exercises ..... 509
47 The properties of gravitons ..... 512
47.1 Force-carrying particles ..... 512
47.2 Photon propagation and polarization ..... 514
47.3 Graviton propagation and polarization ..... 516
Exercises ..... 519
48 Higher dimensional spacetime ..... 520
48.1 Gauge transformations in five dimensions ..... 521
48.2 Unifying electromagnetism and gravitation ..... 522
Exercises ..... 525
49 From classical to quantum gravity ..... 527
49.1 Extra dimensions ..... 527
49.2 String theory ..... 530
49.3 Parametrizing the string ..... 532
49.4 Strings in relativity ..... 534
49.5 Superspace ..... 536
49.6 Loop quantum gravity ..... 537
49.7 Anti-de Sitter spacetime ..... 539
49.8 Our current best guess ..... 542
Exercises ..... 545
50 The Big-Bang singularity ..... 547
50.1 Facts about Euclidean geometry ..... 547
50.2 Orthogonal geodesics in spacetime ..... 548
50.3 Our Universe ..... 551
Exercises ..... 552
A Further reading ..... 554
B Conventions and notation ..... 562
B. 1 Electromagnetic units ..... 562
B. 2 Vectors, 1-forms and tensors ..... 562
B. 3 Covariant derivatives ..... 564
C Manifolds and bundles ..... 565
C. 1 Preliminaries ..... 566
C. 2 Maps and functions ..... 567
C. 3 One-to-one, into, and onto ..... 567
C. 4 Continuous maps ..... 568
C. 5 Manifolds, coordinates, and charts ..... 569
C. 6 Functions on the manifold ..... 571
C. 7 Differentiation on the manifold ..... 572
C. 8 Compact regions ..... 575
C. 9 Curves ..... 575
C. 10 Tangent spaces
578
578
C. 11 Fibre bundles ..... 578
D Embedding ..... 581
Exercises ..... 586
E Answers to selected problems ..... 587
Index ..... 614

Overture

Our Theory of Gravitation is as good as perfect: Lagrange, it is well known, has proved that the Planetary System, on this scheme, will endure forever; Laplace, still more cunningly, even guesses that it could not have been made on any other scheme.
Thomas Carlyle (1795-1881) Sartor Restartus
General relativity is one of the most profound statements in science. It is a theory of gravity that allows us to model the large-scale structure of the Universe; to understand and explain the workings of black holes; to reveal how gravity interacts with light waves and even how the Universe hosts its own, gravitational, waves. It is central to our notions of where the Universe comes from and what its eventual fate might be. The theory's conception was largely the work of one remarkable scientist. 1 1 ^(1){ }^{1}1 General relativity is often viewed as a fearsomely difficult theory whose mastery is a rite of passage into the world of advanced physics. However, as we will show, the theory is based on simple principles which are straightforward to grasp. This initial chapter will outline the path we will take through the book and will introduce some important bits of jargon. We start with the word relativity.

0.1 What is relativity?

Newton's 2 2 ^(2){ }^{2}2 first law states that a body with no force acting on it will move in a straight line with a uniform velocity. This statement would be true if viewed in any inertial reference frame ('inertial' here means that the reference frame, which defines the coordinates used, is not accelerating). There are lots of inertial reference frames to choose from (all moving at different speeds and in different directions with respect to each other), but in all of them, Newton's first law holds. Even before Einstein came on the scene it was possible to formulate a principle of relativity:

The principle of relativity:

Physical laws are the same in all inertial reference frames.
This implies that there is no absolute rest frame in Newtonian physics. 3 3 ^(3){ }^{3}3 Any inertial reference frame will do, and we then have to describe motion relative to the inertial reference frame we have chosen.
0.1 What is relativity? 1
0.2 What is general relativity? 3
0.3 What is a metric? 5
0.4 What are we building on? 6
0.5 Who is this book for? 8
0.6 Units in this book 9
Exercises 10
0.1 What is relativity? 1 0.2 What is general relativity? 3 0.3 What is a metric? 5 0.4 What are we building on? 6 0.5 Who is this book for? 8 0.6 Units in this book 9 Exercises 10| 0.1 What is relativity? | 1 | | :--- | ---: | | 0.2 What is general relativity? | 3 | | 0.3 What is a metric? | 5 | | 0.4 What are we building on? | 6 | | 0.5 Who is this book for? | 8 | | 0.6 Units in this book | 9 | | Exercises | 10 |

Exercises

1 1 ^(1){ }^{1}1 Albert Einstein (1879-1955).
3 A 3 A ^(3)A{ }^{3} \mathrm{~A}3 A rest frame of a particle is that frame of reference in which a particle is measured to be at rest
Fig. 1 Juggling is best performed in (a) an inertial reference frame, or one which is accelerating constantly, rather than (b) one which has a time-varying acceleration a ( t ) a ( t ) a(t)a(t)a(t).
Example 0.1
Physical processes follow simple laws in inertial frames, because we can then apply Newton's laws in their simplest form.
  • A juggler will prefer to carry out their juggling when they are standing on a fixed floor [Fig. 1(a)]. They are then in an inertial rest frame and the juggler can effectively calculate the parabolic Newtonian trajectories of all the balls in his or her head, just assuming the effect of gravity.
  • However, you can juggle a set of balls equally well on a moving train or in a moving plane, as long as you are travelling at a constant velocity (i.e. that you are in an inertial frame). The same Newtonian laws apply as before. Einstein's special theory of relativity is concerned with these inertial frames of reference.
  • In fact, juggling will also be possible if the train or plane is in a state of constant acceleration. In that case, the juggler would not be in an inertial frame but the uniform acceleration would be indistinguishable from an additional gravitational field, and the juggler would be able to correct for this effect without difficulty, again using Newtonian laws. This idea is at the root of the equivalence principle that underlies Einstein's general theory of relativity.
  • Juggling is very difficult though if the acceleration is rapidly time-varying (i.e. if the train suddenly jolts forward or shakes backwards and forwards) because additional time-varying forces would then act on the balls [Fig. 1(b)].
The principle of relativity is, in effect, a symmetry principle. It tells us that physics works in the same way, however we choose our coordinates, as long as our coordinates are described relative to an inertial reference frame. We can transform from one inertial set of coordinates to another by rotating, translating, or even what we will call 'boosting'. A boost is a transformation to another coordinate system moving with uniform velocity with respect to the initial one.

Example 0.2

An example of the independence of physics to boosts that is familiar to many is the sensation one experiences when seated on a train in a station and observing a neighbouring train moving forward. For a moment, you might think that your train is moving backward, and you need to check some fixed object on the station platform before you are sure which situation has occurred, and in effect whether your train is still in the station reference frame or in a new boosted reference frame.
The principle of relativity has been understood for a long time; Newton and Galileo accepted it. As we will explore in more detail in Chapter 1, Einstein's first revolutionary step, made in 1905, was to add an additional postulate:
The principle of invariant light speed:
As measured in any inertial reference frame, light propagates in empty space with a definite speed c c ccc, that is independent of the state of motion of the emitting body.
This principle has all manner of strange consequences that form the basis of Einstein's special theory of relativity. Why is it special? Because it is a theory that focuses on inertial reference frames and ignores gravity. Thus, it is restricted to some special (but important) cases. A good physical theory is said to be covariant if it transforms sensibly 4 4 ^(4){ }^{4}4 under coordinate transformations. Special relativity is a theory which is covariant with respect to translations, rotations, reflections, and boosts. The boosts have to be carried out consistently with respect to the principle of invariant light speed and we will see in Chapter 1 that this must be carried out using a Lorentz transformation. Thus special relativity is said to be a theory which possesses Lorentz covariance.
Special relativity tells us that nothing can go faster than light. Thus, on a spacetime diagram, that is, a graph with time running up the page with spatial coordinates perpendicular, an observer's future and past can be represented as being inside a forward and backward light cone (see Fig. 2). Anything the observer can do now (throw a stone, shine a torch) can only influence the region of spacetime inside, or on, the forward light cone; anything that influences the observer now (the appearance of the night sky, an assassin's bullet) can only originate from inside, or on, the backward light cone. Moreover, if we populate spacetime with lots and lots of observers at different points, each will have their own light cone and all these light cones will be oriented in the same way [see Fig. 3(a)]. We shall see that this is a description of what is known as flat spacetime and is the situation that we assume to hold in special relativity.
(a)

(b)
Fig. 3 (a) The light cones in flat spacetime all line up at different points, like soldiers on parade. (b) The light cones in curved spacetime look much more disorderly, as if some of the soldiers on parade now have too much alcohol in their bloodstream.

0.2 What is general relativity?

This book is about Einstein's general theory of relativity in which gravity is described. To understand the significance of what Einstein did, it is helpful to first take a step back. Newton constructed a theory of gravity,
4 4 ^(4){ }^{4}4 Clearly the notion of a 'sensible' transformation requires some explanation. For now, it can be thought of as the requirement that equations take the same form after transformation. This implies that no new terms should appear in an equation upon transformation to a different system of coordinates.
Fig. 2 The light cone in a spacetime diagram. Time is plotted vertically and the horizontal plane represents two of the three orthogonal spatial directions. An observer at the origin has the potential to influence any event inside her forward light cone and be influenced by any event inside her backward light cone.
5 G 5 G ^(5)G{ }^{5} G5G is the gravitational constant 6.6741 × 10 11 N kg 2 m 2 6.6741 × 10 11 N kg 2 m 2 6.6741 xx10^(-11)Nkg^(-2)m^(2)6.6741 \times 10^{-11} \mathrm{~N} \mathrm{~kg}^{-2} \mathrm{~m}^{2}6.6741×1011 N kg2 m2, measured first by Henry Cavendish (1731-1810) in 1798.
6 6 ^(6){ }^{6}6 Hence, the attitude of Thomas Carlyle in the quotation (written in 1836) that opened this chapter.
7 7 ^(7){ }^{7}7 Newton knew this, as can be seen in the quotation heading the Preface to this book on page v. Newton has described the force produced by a distant mass, but a real force was felt to require a cause, and Newton couldn't come up with one. In the 1717 preface to his book on 'Opticks', he stated that he would not be taking gravity as an es would not be taking gravity as an es sential property of matter because he didn't know its cause because 1 am no yet satisfied about it for want of experiments'.
8 8 ^(8){ }^{8}8 John Archibald Wheeler (1911-2008)
9 9 ^(9){ }^{9}9 General relativity is a classical field theory. By classical we mean that the theory is not compatible with quantum mechanics. The search for a quantum theory of gravitation is still ongoing, a matter we will return to in Chapter 49
10 10 ^(10){ }^{10}10 By matter fields we mean those fields describing massive particles or massive fluids, and also those describing phenomena such as electromagnetism, which is represented by a field with en ergetic, but massless, excitations.
published in his Principia in 1686, which meant that for the first time it was possible to appreciate that the same force that caused the Moon to orbit the Earth also caused the famous (and probably apocryphal) apple to fall from the tree. Newtonian gravity could be described by an equation, F = G M m / r 2 F = G M m / r 2 F=GMm//r^(2)F=G M m / r^{2}F=GMm/r2, relating 5 5 ^(5){ }^{5}5 the magnitude of the force F F FFF between masses M M MMM and m m mmm separated by distance r r rrr. This inverse-square relationship beautifully explains the elliptical motions of the planets and led to many people thinking that gravity was a done deal. 6 6 ^(6){ }^{6}6 However, there was a fly in the ointment. Newton had shown how gravity behaves, but he had not explained what it was. 7 7 ^(7){ }^{7}7 Mechanical explanations were popular in the seventeenth century (it was, after all, the golden age of clockwork mechanisms) and in Newton's theory of gravity it was not possible to see where the gear wheels were located in this theory; there was no mechanism, no machinery, just an influence teleporting itself through empty space; it made no sense. What was transmitting the gravity through space? And what even was gravity anyway? It took Einstein's genius to realize that gravity isn't something that just gets transmitted through space. Space, or more accurately spacetime, is a structural property of the gravitational field, with the curvature in the very fabric of spacetime itself [see Fig. 3(b)] being directly determined by the matter within it. In the beautiful phrase coined by Wheeler 8 8 ^(8){ }^{8}8 :
Spacetime tells matter how to move; matter tells spacetime how to curve.
General relativity is a field theory that describes gravity. A field is a machine that takes a position in spacetime and outputs an object representing the amplitude of something at that point in spacetime. The amplitude could be a scalar, a vector, a tensor etc. 9 9 ^(9){ }^{9}9 Field theories describe matter, such that we speak of the electromagnetic field as describing light and charges, of particle fields as describing elementary particles and of the fluid field as describing the dynamics of continuous fluids. General relativity tells us that the effects we call gravitational reflect the energy content of all of the matter fields 10 10 ^(10){ }^{10}10 in the Universe. What makes general relativity unique as a field theory is that the energy of these matter fields, and hence gravitation itself, is inextricably linked to another, very special, field: the metric field that describes the geometry of space and time.
The clearest expression of how general relativity describes gravitation is the Einstein equation. This may be written conceptually as
(1) ( Curvature of spacetime ) = ( Energy density of matter fields ) . (1) (  Curvature of   spacetime  ) = (  Energy density   of matter fields  ) . {:(1)((" Curvature of ")/(" spacetime "))=((" Energy density ")/(" of matter fields ")).:}\begin{equation*} \binom{\text { Curvature of }}{\text { spacetime }}=\binom{\text { Energy density }}{\text { of matter fields }} . \tag{1} \end{equation*}(1)( Curvature of  spacetime )=( Energy density  of matter fields ).
The left-hand side of the Einstein equation is geometrical. The curvature is a geometrical property of space and time that follows from the metric field. The right-hand side of the Einstein equation is physical and reflects fields that describe the content of the Universe.
In formulating general relativity, Einstein began from this intuition, but initially struggled with the details of how curvature can be described
mathematically using geometrical techniques that were unfamiliar to him. In the century since Einstein's monumental work, there has been a great deal of progress in both the techniques and presentation of geometry, not least following the work of Élie Cartan, 11 11 ^(11){ }^{11}11 but the reputation for difficulty that general relativity enjoys can still be traced back to the mathematical barrier this material presents to new students of gravitation. In fact, Einstein was helped by a friend, the mathematician Marcel Grossman, 12 12 ^(12){ }^{12}12 to master geometry, but despite this, Einstein worked tirelessly for a further decade before the theory was complete. In this spirit of friendly help, the opening sections of this book are designed to help the gifted amateur understand the mathematical language of the lefthand (geometrical) side of the Einstein equation, but in due course we will fill in the details of both sides.

0.3 What is a metric?

General relativity concerns the metric field, but what is that? The metric field can be thought of as a set of rules that allow us to work out the distances and angles between points in space and time. The geometrical description links space and time so inseparably that we refer to them as a single entity 13 13 ^(13){ }^{13}13 spacetime. The metric field then describes geometry by providing the distances and angles between points in spacetime, known as events. The metric itself can be expressed by writing down the metric line element which is an equation for the interval between two closely spaced events. This allows us to carry out thought experiments where we imagine that spacetime has various particular curvatures and then investigate the consequences.
Ancient Alexandria's great mathematician Euclid 14 14 ^(14){ }^{14}14 was never able to prove his parallel postulate: the intuition that two lines that start parallel will continue to be parallel out to infinity. It was realized by geometers in the eighteenth and nineteenth centuries that this is only true for a flat plane and that consistent geometries where parallel lines converge or diverge are possible in curved spaces. The deviation of parallel lines from parallelism gives us a test for, and measure of, curvature. In other words, any relative motion of two small, uncharged test particles, set off at the same speeds on parallel paths, must be the consequence of a gravitational field. The information about curvature is encoded in the metric. Einstein's equation is a differential equation that, when solved for a distribution of matter, gives us access to a metric field.
The metric is a field because, in general, it varies throughout spacetime. That is to say we insert a position in spacetime into the metric field and we are returned with a metric that allows us to compute the distance between events in that part of spacetime. The left-hand side of the Einstein equation can be thought of as a differential equation describing the variation of the metric field in spacetime and hence we obtain our notion of the curvature of spacetime [see Fig. 3(b)].
11 11 ^(11){ }^{11}11 Élie Joseph Cartan (1869-1951).
12 12 ^(12){ }^{12}12 Marcel Grossmann (1878-1936).
13 13 ^(13){ }^{13}13 This notion of a single entity requires another conceptual leap: the coordinates in the metric are of no intrinsic significance. The symbol t t ttt, to which we have grown accustomed for representing time, becomes less important.
14 14 ^(14){ }^{14}14 Euclid (who lived around 300 BC ).
15 15 ^(15){ }^{15}15 Why? The Newtonian theory is known to provide a good description of gravitation in many of the circum stances in which we encounter it, i.e the limit of small gravitational interactions and of particles travelling slowly compared to the speed of light.
Fig. 4 The gravitational field g ( r ) g ( r ) vec(g)( vec(r))\vec{g}(\vec{r})g(r) around a particle of mass M M MMM.
Fig. 5 The gravitational potential Φ ( r ) Φ ( r ) Phi(r)\Phi(r)Φ(r) (a scalar field) at a distance r r rrr from a particle of mass M M MMM at the origin.
16 16 ^(16){ }^{16}16 Henry Cavendish used a torsion balance to measure the tiny gravitational attraction between lead spheres.

0.4 What are we building on?

General relativity supersedes Newton's theory of gravity, but the two theories should agree if the gravitational fields are weak. 15 15 ^(15){ }^{15}15 Therefore, it is worth restating the older Newtonian theory: Newton asserted that the force F F vec(F)\vec{F}F on a point mass m m mmm at position r r vec(r)\vec{r}r due to a point mass M M MMM at the origin is given by the vector equation
(2) F = G M m r 2 r ^ , (2) F = G M m r 2 r ^ , {:(2) vec(F)=-(GMm)/(r^(2))* hat(vec(r))",":}\begin{equation*} \vec{F}=-\frac{G M m}{r^{2}} \cdot \hat{\vec{r}}, \tag{2} \end{equation*}(2)F=GMmr2r^,
in which arrows denote three-dimensional vectors, and the minus sign expresses the fact that gravity is an attractive force. Since the force scales with the mass m m mmm, we can define a gravitational field vector g g vec(g)\vec{g}g as the force per unit mass, i.e.
(3) F = m g , (3) F = m g , {:(3) vec(F)=m vec(g)",":}\begin{equation*} \vec{F}=m \vec{g}, \tag{3} \end{equation*}(3)F=mg,
and this is in fact a vector field g ( r ) g ( r ) vec(g)( vec(r))\vec{g}(\vec{r})g(r) that depends on position r r vec(r)\vec{r}r. For a point mass, we then have (see Fig. 4)
(4) g ( r ) = G M r ^ r 2 . (4) g ( r ) = G M r ^ r 2 . {:(4) vec(g)( vec(r))=-(GM( hat(vec(r))))/(r^(2)).:}\begin{equation*} \vec{g}(\vec{r})=-\frac{G M \hat{\vec{r}}}{r^{2}} . \tag{4} \end{equation*}(4)g(r)=GMr^r2.
The gravitational field is a conservative field of force (meaning the net work done in moving a point mass around a closed loop is zero, basically that the work done in rolling a ball up a hill is equivalent to the energy liberated when it rolls back down again), and hence we can write it as the gradient of a scalar potential. Conventionally, we include a minus sign and so write
(5) g = Φ , (5) g = Φ , {:(5) vec(g)=- vec(grad)Phi",":}\begin{equation*} \vec{g}=-\vec{\nabla} \Phi, \tag{5} \end{equation*}(5)g=Φ,
where Φ ( r ) Φ ( r ) Phi( vec(r))\Phi(\vec{r})Φ(r) is a scalar field known as the gravitational potential. For the case of a point mass M M MMM at the origin, Φ ( r ) = G M / r Φ ( r ) = G M / r Phi( vec(r))=-GM//r\Phi(\vec{r})=-G M / rΦ(r)=GM/r (see Fig. 5). Gauss' theorem (to be discussed below) shows that if the mass at the origin is not point-like, but is spherically symmetric, then outside the radius of the mass distribution the same results still hold.

Example 0.3

For a test mass on the surface of Earth, the gravitational force F = m g F = m g F=mgF=m gF=mg, where g = g = g=g=g= 9.81 m s 2 9.81 m s 2 9.81ms^(-2)9.81 \mathrm{~m} \mathrm{~s}^{-2}9.81 m s2. Following the Cavendish experiment 16 16 ^(16){ }^{16}16 of 1798 (and later improvement on it), the gravitational constant G G GGG was measured to be 6.6741 × 10 11 N kg 2 m 2 6.6741 × 10 11 N kg 2 m 2 6.6741 xx10^(-11)Nkg^(-2)m^(2)6.6741 \times 10^{-11} \mathrm{~N} \mathrm{~kg}^{-2} \mathrm{~m}^{2}6.6741×1011 N kg2 m2. Cavendish described this experiment as 'weighing the world' because we can then use eqn 4 to deduce that
(6) M = g R 2 G , (6) M = g R 2 G , {:(6)M_(o+)=(gR_(o+)^(2))/(G)",":}\begin{equation*} M_{\oplus}=\frac{g R_{\oplus}^{2}}{G}, \tag{6} \end{equation*}(6)M=gR2G,
where R = 6.378 × 10 6 m R = 6.378 × 10 6 m R_(o+)=6.378 xx10^(6)mR_{\oplus}=6.378 \times 10^{6} \mathrm{~m}R=6.378×106 m is the radius of the Earth. This then gives the mass of the Earth as M = 5.97 × 10 24 kg M = 5.97 × 10 24 kg M_(o+)=5.97 xx10^(24)kgM_{\oplus}=5.97 \times 10^{24} \mathrm{~kg}M=5.97×1024 kg. Here we are using the commonly used symbol o+\oplus to denote the Earth. With these two numbers, we can also work out the mean density of the Earth by dividing mass M M M_(o+)M_{\oplus}M by volume 4 3 π R 3 4 3 π R 3 (4)/(3)piR_(o+)^(3)\frac{4}{3} \pi R_{\oplus}^{3}43πR3, which yields a value of 5.5 × 10 3 kg m 3 5.5 × 10 3 kg m 3 5.5 xx10^(3)kgm^(-3)5.5 \times 10^{3} \mathrm{~kg} \mathrm{~m}^{-3}5.5×103 kg m3, just over a factor of 5 greater than water.
We can play a similar game with our nearest star, the Sun. The Earth's orbit around the Sun is elliptical, but it's not far from circular, so for an estimate we can equate the gravitational force on the Earth due to the Sun G M M / R ES 2 G M M / R ES  2 GM_(o.)M_(o+)//R_("ES ")^(2)G M_{\odot} M_{\oplus} / R_{\text {ES }}^{2}GMM/RES 2 to the centripetal force M v 2 / R ES M v 2 / R ES M_(o+)v^(2)//R_(ES)M_{\oplus} v^{2} / R_{\mathrm{ES}}Mv2/RES, where v v vvv is the speed of the Earth, M M M_(o.)M_{\odot}M is the mass of the Sun ( o.\odot being the symbol we use for denoting the Sun) and R ES R ES R_(ES)R_{\mathrm{ES}}RES, the separation of the Sun and Earth is called the astronomical unit (abbreviated A.U.). The value of R ES R ES R_(ES)R_{\mathrm{ES}}RES was first estimated by the Greeks by measuring the angle between a half-moon and the Sun (see Fig. 6), although subsequently improved in the seventeenth century and later by measuring the solar parallax using the transit of Venus. A modern value is 1.496 × 10 11 m 1.496 × 10 11 m 1.496 xx10^(11)m1.496 \times 10^{11} \mathrm{~m}1.496×1011 m. The period τ τ tau\tauτ of the circular orbit is related to v v vvv and R ES R ES R_(ES)R_{\mathrm{ES}}RES by τ = 2 π R ES / v τ = 2 π R ES / v tau=2piR_(ES)//v\tau=2 \pi R_{\mathrm{ES}} / vτ=2πRES/v, where our equating of centripetal and gravitational forces yields v 2 = G M / R ES v 2 = G M / R ES v^(2)=GM_(o.)//R_(ES)v^{2}=G M_{\odot} / R_{\mathrm{ES}}v2=GM/RES. We can hence deduce from τ = 1 τ = 1 tau=1\tau=1τ=1 year that M = 1.99 × 10 30 kg M = 1.99 × 10 30 kg M_(o.)=1.99 xx10^(30)kgM_{\odot}=1.99 \times 10^{30} \mathrm{~kg}M=1.99×1030 kg. The density of the Sun, using R = 6.96 × 10 8 m R = 6.96 × 10 8 m R_(o.)=6.96 xx10^(8)mR_{\odot}=6.96 \times 10^{8} \mathrm{~m}R=6.96×108 m, then works out to be around 1.4 × 10 3 kg m 3 1.4 × 10 3 kg m 3 1.4 xx10^(3)kgm^(-3)1.4 \times 10^{3} \mathrm{~kg} \mathrm{~m}^{-3}1.4×103 kg m3, just a bit larger than that of water.
One conclusion from all of this is that, from a nineteenth-century perspective, the idea of a black hole (an object with such intense surface gravity that even light could not escape) seems highly unlikely. The escape velocity v esc v esc  v_("esc ")v_{\text {esc }}vesc  from a spherical object of radius R R RRR, mass M = 4 3 π ρ R 3 M = 4 3 π ρ R 3 M=(4)/(3)pi rhoR^(3)M=\frac{4}{3} \pi \rho R^{3}M=43πρR3 and density ρ ρ rho\rhoρ is simply worked out by equating the kinetic energy 1 2 m v esc 2 1 2 m v esc  2 (1)/(2)mv_("esc ")^(2)\frac{1}{2} m v_{\text {esc }}^{2}12mvesc 2 of a launching test mass to the depth m | Φ | = G M m / R m | Φ | = G M m / R m|Phi|=GMm//Rm|\Phi|=G M m / Rm|Φ|=GMm/R, of the potential energy well in which it starts its journey. This yields v esc = 2 G M / R = v esc  = 2 G M / R = v_("esc ")=sqrt(2GM//R)=v_{\text {esc }}=\sqrt{2 G M / R}=vesc =2GM/R= R 8 π G ρ R 8 π G ρ Rsqrt(8pi G rho)R \sqrt{8 \pi G \rho}R8πGρ. This result scales linearly with R R RRR and would reach v esc = c v esc = c v_(esc)=cv_{\mathrm{esc}}=cvesc=c only when
(7) R = c 8 π G ρ . (7) R = c 8 π G ρ . {:(7)R=(c)/(sqrt(8pi G rho)).:}\begin{equation*} R=\frac{c}{\sqrt{8 \pi G \rho}} . \tag{7} \end{equation*}(7)R=c8πGρ.
Since the best-studied objects in the Universe were those in our own Solar System, and these have mean densities that don't exceed that of water by more than a factor of about five, and since the rest of the Universe seems to be filled with stars that look somewhat similar to the Sun, then eqn 7 would only be likely to be satisfied by an object with radius of more than an astronomical unit, the distance from the Earth to the Sun. No normal stars were thought to be this big. Thus, it didn't seem as if eqn 7 would hold. 17 17 ^(17){ }^{17}17
Because g g vec(g)\vec{g}g points inwards to any point mass M M MMM at the origin, we deduce that the integral of g g vec(g)\vec{g}g over any spherical surface S S SSS of radius R R RRR surrounding the origin is
(8) S g d S = G M R 2 4 π R 2 = 4 π G M (8) S g d S = G M R 2 4 π R 2 = 4 π G M {:(8)int_(S) vec(g)*d vec(S)=-(GM)/(R^(2))*4piR^(2)=-4pi GM:}\begin{equation*} \int_{S} \vec{g} \cdot \mathrm{~d} \vec{S}=-\frac{G M}{R^{2}} \cdot 4 \pi R^{2}=-4 \pi G M \tag{8} \end{equation*}(8)Sg dS=GMR24πR2=4πGM
The divergence theorem is a result from vector calculus and says that the left-hand side of this equation, a surface integral of the flux of the vector g g vec(g)\vec{g}g out of the surface, can be rewritten as an integral over the volume of the divergence of g g vec(g)\vec{g}g, written as g g vec(grad)* vec(g)\vec{\nabla} \cdot \vec{g}g. Hence, we have
(9) S g d S = V g d V (9) S g d S = V g d V {:(9)int_(S) vec(g)*d vec(S)=int_(V) vec(grad)* vec(g)dV:}\begin{equation*} \int_{S} \vec{g} \cdot \mathrm{~d} \vec{S}=\int_{V} \vec{\nabla} \cdot \vec{g} \mathrm{~d} V \tag{9} \end{equation*}(9)Sg dS=Vg dV
where here the volume element d V = d 3 r d V = d 3 r dV=d^(3)r\mathrm{d} V=\mathrm{d}^{3} rdV=d3r, i.e. the gravitational flux out of a surface is equal to the integral of the divergence of the gravitational field inside the volume enclosed by the surface. From this, we can deduce that 18 18 ^(18){ }^{18}18
(12) g = 4 π G M δ ( r ) (12) g = 4 π G M δ ( r ) {:(12) vec(grad)* vec(g)=-4pi GM delta( vec(r)):}\begin{equation*} \vec{\nabla} \cdot \vec{g}=-4 \pi G M \delta(\vec{r}) \tag{12} \end{equation*}(12)g=4πGMδ(r)
The Newtonian results for a point mass can be generalized for the field due to a distribution of mass since Newtonian theory is linear. Thus, for example,
(13) Φ ( r ) = G ρ ( r ) | r r | d 3 r (13) Φ ( r ) = G ρ r r r d 3 r {:(13)Phi( vec(r))=int-(G rho( vec(r)^(')))/(|( vec(r))- vec(r)^(')|)d^(3)r^('):}\begin{equation*} \Phi(\vec{r})=\int-\frac{G \rho\left(\vec{r}^{\prime}\right)}{\left|\vec{r}-\vec{r}^{\prime}\right|} \mathrm{d}^{3} r^{\prime} \tag{13} \end{equation*}(13)Φ(r)=Gρ(r)|rr|d3r
Fig. 6 Diagram (not to scale) showing how the distance to the Sun can be estimated by measuring the angle between the Sun and the half-Moon. The distances are given in A.U. The calculation relies on an estimate of the distance to the Moon which can be estimated from measurements of lunar parallax.

The Earth o+\oplus

Mass: M = 5.97 × 10 24 kg M = 5.97 × 10 24 kg M_(o+)=5.97 xx10^(24)kgM_{\oplus}=5.97 \times 10^{24} \mathrm{~kg}M=5.97×1024 kg Radius: R = 6.378 × 10 6 m R = 6.378 × 10 6 m R_(o+)=6.378 xx10^(6)mR_{\oplus}=6.378 \times 10^{6} \mathrm{~m}R=6.378×106 m
The Sun ^(-){ }^{-}
Mass: M = 1.99 × 10 30 kg M = 1.99 × 10 30 kg M_(o.)=1.99 xx10^(30)kgM_{\odot}=1.99 \times 10^{30} \mathrm{~kg}M=1.99×1030 kg Radius: R = 6.96 × 10 8 m R = 6.96 × 10 8 m R_(o.)=6.96 xx10^(8)mR_{\odot}=6.96 \times 10^{8} \mathrm{~m}R=6.96×108 m
M M = 3.33 × 10 5 R R = 1.09 × 10 2 M M = 3.33 × 10 5 R R = 1.09 × 10 2 {:[(M_(o.))/(M_(o+))=3.33 xx10^(5)],[(R_(o.))/(R_(o+))=1.09 xx10^(2)]:}\begin{aligned} & \frac{M_{\odot}}{M_{\oplus}}=3.33 \times 10^{5} \\ & \frac{R_{\odot}}{R_{\oplus}}=1.09 \times 10^{2} \end{aligned}MM=3.33×105RR=1.09×102
17 17 ^(17){ }^{17}17 As we shall see later, it is possible to have compact objects such as neutron stars which have enormous densities. This only became possible to understand after the development of quantum mechanics.
18 18 ^(18){ }^{18}18 The Dirac delta function δ ( x ) δ ( x ) delta(x)\delta(x)δ(x) is a function localized at the origin and which has integral unity. It is the perfect model of a localized particle, and is used here to fix the point mass M M MMM at the origin. We have written a three-dimensional delta function δ ( r ) δ ( x ) δ ( y ) δ ( z ) δ ( r ) δ ( x ) δ ( y ) δ ( z ) delta( vec(r))-=delta(x)delta(y)delta(z)\delta(\vec{r}) \equiv \delta(x) \delta(y) \delta(z)δ(r)δ(x)δ(y)δ(z), often denoted δ ( 3 ) ( x ) δ ( 3 ) ( x ) delta^((3))( vec(x))\delta^{(3)}(\vec{x})δ(3)(x). The integral of a d d ddd-dimensional Dirac delta function δ ( d ) ( x ) δ ( d ) ( x ) delta^((d))( vec(x))\delta^{(d)}(\vec{x})δ(d)(x) is given by
(10) d d x δ ( d ) ( x ) = 1 (10) d d x δ ( d ) ( x ) = 1 {:(10)intd^(d)xdelta^((d))( vec(x))=1:}\begin{equation*} \int \mathrm{d}^{d} x \delta^{(d)}(\vec{x})=1 \tag{10} \end{equation*}(10)ddxδ(d)(x)=1
It is defined by
(11) d d x f ( x ) δ ( d ) ( x ) = f ( 0 ) (11) d d x f ( x ) δ ( d ) ( x ) = f ( 0 ) {:(11)intd^(d)xf( vec(x))delta^((d))( vec(x))=f(0):}\begin{equation*} \int \mathrm{d}^{d} x f(\vec{x}) \delta^{(d)}(\vec{x})=f(0) \tag{11} \end{equation*}(11)ddxf(x)δ(d)(x)=f(0)
19 19 ^(19){ }^{19}19 An arbitrary distribution of mass can be written as an integral of point masses using
ρ ( r ) ρ ( r ) δ ( r r ) d 3 r ρ ( r ) ρ r δ r r d 3 r rho( vec(r))-=int rho( vec(r)^('))delta(( vec(r))- vec(r)^('))d^(3)r^(')\rho(\vec{r}) \equiv \int \rho\left(\vec{r}^{\prime}\right) \delta\left(\vec{r}-\vec{r}^{\prime}\right) \mathrm{d}^{3} r^{\prime}ρ(r)ρ(r)δ(rr)d3r
The integral form of eqn 12 is
S g d S = 4 π G M S g d S = 4 π G M int_(S) vec(g)*d vec(S)=-4pi GM\int_{S} \vec{g} \cdot \mathrm{~d} \vec{S}=-4 \pi G MSg dS=4πGM
where M = ρ ( r ) d 3 r M = ρ r d 3 r M=int rho( vec(r)^('))d^(3)r^(')M=\int \rho\left(\vec{r}^{\prime}\right) \mathrm{d}^{3} r^{\prime}M=ρ(r)d3r is the total mass enclosed inside the surface S S SSS, This result, which generalizes eqn 8 to an arbitrary distribution of mass, is often known as Gauss' theorem for gravitational fields.
20 20 ^(20){ }^{20}20 In electrostatics, the force on charge q q qqq is F = q E F = q E vec(F)=q vec(E)\vec{F}=q \vec{E}F=qE where E = ϕ E = ϕ vec(E)=- vec(grad)phi\vec{E}=-\vec{\nabla} \phiE=ϕ is the electric field and ϕ ϕ phi\phiϕ is the electrostatic potential. Gauss' theorem for electrostatics is (in SI units)
S E d S = Q / ϵ 0 S E d S = Q / ϵ 0 int_(S) vec(E)*d vec(S)=Q//epsilon_(0)\int_{S} \vec{E} \cdot \mathrm{~d} \vec{S}=Q / \epsilon_{0}SE dS=Q/ϵ0
where Q Q QQQ is the charge enclosed by the surface S S SSS, and E = ρ / ϵ 0 E = ρ / ϵ 0 vec(grad)* vec(E)=rho//epsilon_(0)\vec{\nabla} \cdot \vec{E}=\rho / \epsilon_{0}E=ρ/ϵ0 where ρ ρ rho\rhoρ here is the charge density, and
2 V = ρ / ϵ 0 2 V = ρ / ϵ 0 grad^(2)V=-rho//epsilon_(0)\nabla^{2} V=-\rho / \epsilon_{0}2V=ρ/ϵ0
is Poisson's equation.
21 21 ^(21){ }^{21}21 Carlyle may have said that 'Our Theory of Gravitation is as good as perfect...' in the quote that opened the chapter, but this discrepancy turned out to be rather significant!
is the gravitational potential at position r r vec(r)\vec{r}r from a distribution of masses with density ρ ( r ) ρ r rho( vec(r)^('))\rho\left(\vec{r}^{\prime}\right)ρ(r). The divergence of the gravitational field can then be written (generalizing eqn 12) as 19 19 ^(19){ }^{19}19
(14) g ( r ) = 4 π G ρ ( r ) . (14) g ( r ) = 4 π G ρ ( r ) . {:(14) vec(grad)* vec(g)( vec(r))=-4pi G rho( vec(r)).:}\begin{equation*} \vec{\nabla} \cdot \vec{g}(\vec{r})=-4 \pi G \rho(\vec{r}) . \tag{14} \end{equation*}(14)g(r)=4πGρ(r).
Equivalently, this can be written using the gravitational potential Φ Φ Phi\PhiΦ, via g = Φ g = Φ vec(g)=- vec(grad)Phi\vec{g}=-\vec{\nabla} \Phig=Φ, to yield
(15) 2 Φ = 4 π G ρ (15) 2 Φ = 4 π G ρ {:(15)grad^(2)Phi=4pi G rho:}\begin{equation*} \nabla^{2} \Phi=4 \pi G \rho \tag{15} \end{equation*}(15)2Φ=4πGρ
which is analogous to Poisson's equation in electrostatics. 20 20 ^(20){ }^{20}20
The dimensions of the gravitational potential Φ Φ Phi\PhiΦ are (velocity) 2 2 ^(2){ }^{2}2 so one might wonder what happens when | Φ | | Φ | |Phi||\Phi||Φ| becomes of the same order as c 2 c 2 c^(2)c^{2}c2, where c c ccc is the speed of light. This would be equivalent to the size of the gravitational potential energy m | Φ | m | Φ | m|Phi|m|\Phi|m|Φ| of a mass m m mmm becoming of the same order as the rest mass energy m c 2 m c 2 mc^(2)m c^{2}mc2. This gives a rough criterion for when Newton's law of gravitation is likely to break down and the effects of general relativity to become extremely important. However, as we shall see in this book, the effects of general relativity can become significantly important, even before this point is reached.

Example 0.4

  • Effects such as gravitational time dilation are certainly measurable, if not dramatic, on the surface of planet Earth (where | Φ | c 2 | Φ | c 2 |Phi|⋘c^(2)|\Phi| \lll c^{2}|Φ|c2 ) and are important in accurate operation of the global positioning system (GPS).
  • The orbit around the Sun of Mercury, the innermost planet in the Solar System, gives Mercury an orbital speed larger than that of any other planet (though at 47 km s 1 47 km s 1 47kms^(-1)47 \mathrm{~km} \mathrm{~s}^{-1}47 km s1 it's less than 0.0002 c and so you wouldn't have thought relativistic effects would be that important). Its orbit axes slightly precesses around, by about 575 arcseconds per century, and most of this (about 532 arcseconds per century) is due to the gravitational effects of other bodies in the Solar system, perfectly calculable by Newtonian gravity. However, despite careful calculations, a discrepancy 21 21 ^(21){ }^{21}21 of about 43 arcseconds per century spite careful calculations, a discrepancy 21 21 ^(21){ }^{21}21 of about 43 arcseconds per century
    stubbornly resisted explanation, until Einstein's general relativity came to the rescue.

0.5 Who is this book for?

As with our earlier book on quantum field theory, our imagined reader is an amateur. We have written this book for someone wanting to learn general relativity without (at least initially) joining the ranks of professional relativists; but (s)he is gifted, possessing a curious and adaptable mind and willing to embark on a significant intellectual challenge; (s)he has abundant curiosity about the physical world, a basic grounding in undergraduate physics, and a desire to be told an entertaining and intellectually stimulating story, but will not feel patronized if a few
mathematical niceties are spelled out in detail. We appreciate that some readers will want to get to the physical predictions of the theory as soon as possible, as their primary concern is with understanding what the Universe is actually like. Others will have more interest in the mathematical structure of the theory; such readers will want to know more about how some more advanced geometric formalism can yield additional insights. We have tried to cater for both types of readers and have designed the book so that it is possible to dip in and out of sections that may be more or less to a reader's taste, though we recommend all beginners to persevere with at least the first thirteen chapters.
The book is structured as follows. We begin in Part I with an introduction to the geometry of flat spacetime, reviewing special relativity and setting up the mathematics of the metric. Part II introduces the mathematics of curvature and sets up the physics of general relativity and finishes with the Einstein field equation. Part III applies these ideas to the Universe and studies various models used in cosmology. Part IV turns to smaller structures inside the Universe: stars, black holes and their orbits. Part V contains a more formal treatment of geometry which may be of more interest to those with more mathematical inclinations. Part VI considers general relativity as a type of field theory and examines how one might link the ideas in our best theory of gravitation to our most successful theories of quantum fields. Before we get going, we will say a few words about units.

0.6 Units in this book

Most readers will be familiar with SI units and we will begin the book using them. However, once we get going, we will switch over to what are known as geometrized units in which we set G = c = 1 . 22 G = c = 1 . 22 G=c=1.^(22)G=c=1 .{ }^{22}G=c=1.22 This has the great advantage of simplifying equations into more memorable forms since they will no longer be encumbered with unnecessary factors of c c ccc and G G GGG whose presence, to the experts, is 'obvious'. It of course has the great disadvantage of creating some confusion whenever a numerical result it needed, but after a bit of practice this does become second nature. Because of the potential confusion for newcomers to the field, we will frequently translate back to SI units (which we will tend to call 'real-world' units) when we need to. Here is an explanation of how to translate between the two systems.

Example 0.5

Conversion factors to convert from quantities expressed in real-world units into geometrized units can be computed by noting that the dimension 23 23 ^(23){ }^{23}23 of c c ccc is L / T L / T L//T\mathrm{L} / \mathrm{T}L/T in the real world, while the dimension of G / c 2 G / c 2 G//c^(2)G / c^{2}G/c2 is L / M L / M L//M\mathrm{L} / \mathrm{M}L/M. To convert a quantity with realworld dimension time into geometrized units, multiply by c c ccc. To convert a quantity with real-world dimension mass multiply by G / c 2 G / c 2 G//c^(2)G / c^{2}G/c2. Both of these quantities then have units of length in the geometrized system.
\curvearrowright It is certainly not necessary to read the book in order In fact, we would recommend skipping several sections on a first reading. Boxes like this one are intended to allow you to navigate a path through the text.
22 22 ^(22){ }^{22}22 The reader will be let in gently to geometrized units. We will not begin to ometrized units. We will not begin to
set c = 1 c = 1 c=1c=1c=1 until Chapter 2, and will not set G = 1 G = 1 G=1G=1G=1 as well until Part III. In addition, Appendix B contains a summary of the units we use to discuss electromagnetism, along with a summary of useful notation.
23 23 ^(23){ }^{23}23 We denote dimension of length by L , time by T and mass by M .
The generalized version of the above argument says that if a quantity has units L n T m M p L n T m M p L^(n)T^(m)M^(p)\mathrm{L}^{n} \mathrm{~T}^{m} \mathrm{M}^{p}Ln TmMp in the real world, then it has units L n + m + p L n + m + p L^(n+m+p)\mathrm{L}^{n+m+p}Ln+m+p in the geometrized system and the conversion factor is c m ( G / c 2 ) p c m G / c 2 p c^(m)(G//c^(2))^(p)c^{m}\left(G / c^{2}\right)^{p}cm(G/c2)p. The table gives some examples.
Quantity real world geometrized conversion
Length L L 1
Time T L c c ccc
Mass M L G / c 2 G / c 2 G//c^(2)G / c^{2}G/c2
Velocity LT 1 c 1 c 1 c^(-1)c^{-1}c1
Energy L 1 T 2 M L 1 T 2 M L^(-1)T^(-2)M\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}L1 T2M L G / c 4 G / c 4 G//c^(4)G / c^{4}G/c4
Energy density L 1 T 2 M L 1 T 2 M L^(-1)T^(-2)M\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}L1 T2M L 2 L 2 L^(-2)\mathrm{L}^{-2}L2 G / c 4 G / c 4 G//c^(4)G / c^{4}G/c4
Mass density L 3 M L 3 M L^(-3)M\mathrm{L}^{-3} \mathrm{M}L3M L 2 L 2 L^(-2)\mathrm{L}^{-2}L2 G / c 2 G / c 2 G//c^(2)G / c^{2}G/c2
Pressure L 1 T 2 M L 1 T 2 M L^(-1)T^(-2)M\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}L1 T2M L 2 L 2 L^(-2)\mathrm{L}^{-2}L2 G / c 4 G / c 4 G//c^(4)G / c^{4}G/c4
Quantity real world geometrized conversion Length L L 1 Time T L c Mass M L G//c^(2) Velocity LT 1 c^(-1) Energy L^(-1)T^(-2)M L G//c^(4) Energy density L^(-1)T^(-2)M L^(-2) G//c^(4) Mass density L^(-3)M L^(-2) G//c^(2) Pressure L^(-1)T^(-2)M L^(-2) G//c^(4)| Quantity | real world | geometrized | conversion | | :--- | :---: | :---: | :---: | | Length | L | L | 1 | | Time | T | L | $c$ | | Mass | M | L | $G / c^{2}$ | | Velocity | LT | 1 | $c^{-1}$ | | Energy | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | L | $G / c^{4}$ | | Energy density | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{4}$ | | Mass density | $\mathrm{L}^{-3} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{2}$ | | Pressure | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{4}$ |
If you want to convert an equation expressed in geometrized units into real-world units, multiply the quantities in the table by their respective conversion factors.

Example 0.6

In geometrized units, the Einstein equation is G = 8 π T G = 8 π T G=8pi T\boldsymbol{G}=8 \pi \boldsymbol{T}G=8πT, where G G G\boldsymbol{G}G and T T T\boldsymbol{T}T are tensors that we will define later in the book (and should not be confused with the gravitational constant G G GGG and temperature T T TTT ). The left-hand side of the Einstein equation has units L 2 L 2 L^(-2)\mathrm{L}^{-2}L2, and the right-hand side has units of energy density ( L 1 T 2 M L 1 T 2 M L^(-1)T^(-2)M\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}L1 T2M ). The left-hand side is multiplied by unity; the right by G / c 4 G / c 4 G//c^(4)G / c^{4}G/c4 and we obtain
(16) G = 8 π G c 4 T (real world). (16) G = 8 π G c 4 T  (real world).  {:(16)G=(8pi G)/(c^(4))T quad" (real world). ":}\begin{equation*} G=\frac{8 \pi G}{c^{4}} \boldsymbol{T} \quad \text { (real world). } \tag{16} \end{equation*}(16)G=8πGc4T (real world). 

Exercises

(0.1) Show using Newtonian theory that the escape velocity from the surface of a star of mass M M MMM and radius r r rrr is v esc = 2 G M / r = 2 | Φ | v esc  = 2 G M / r = 2 | Φ | v_("esc ")=sqrt(2GM//r)=sqrt(2|Phi|)v_{\text {esc }}=\sqrt{2 G M / r}=\sqrt{2|\Phi|}vesc =2GM/r=2|Φ|. Show that the condition v esc = c v esc  = c v_("esc ")=cv_{\text {esc }}=cvesc =c will occur if r = 2 G M / c 2 r = 2 G M / c 2 r=2GM//c^(2)r=2 G M / c^{2}r=2GM/c2, which is known as the Schwarzschild radius
(0.2) Estimate the surface gravity g g ggg and the escape velocity v esc v esc  v_("esc ")v_{\text {esc }}vesc  for (i) the surface of the Earth ( R = R = (R_(o+)=:}\left(R_{\oplus}=\right.(R= 6.378 × 10 6 m , M = 5.97 × 10 24 kg 6.378 × 10 6 m , M = 5.97 × 10 24 kg 6.378 xx10^(6)m,M_(o+)=5.97 xx10^(24)kg6.378 \times 10^{6} \mathrm{~m}, M_{\oplus}=5.97 \times 10^{24} \mathrm{~kg}6.378×106 m,M=5.97×1024 kg ), (ii) the surface of the Sun ( R = 6.96 × 10 8 m , M = Sun R = 6.96 × 10 8 m , M = Sun(R_(o.)=6.96 xx10^(8)(m),M_(o.)=:}\operatorname{Sun}\left(R_{\odot}=6.96 \times 10^{8} \mathrm{~m}, M_{\odot}=\right.Sun(R=6.96×108 m,M=
1.99 × 10 30 kg 1.99 × 10 30 kg 1.99 xx10^(30)kg1.99 \times 10^{30} \mathrm{~kg}1.99×1030 kg ), and (iii) the surface of a 1.4 M 1.4 M 1.4M_(o.)1.4 M_{\odot}1.4M neutron star with radius 10 km .
(0.3) Evaluate the tidal force (the difference in gravitational forces from one end [head] to the other [feet]) on a 1.8 m tall human being (i) standing on the Earth, (ii) at the Schwarzschild radius of a 3 M 3 M 3M_(o.)3 M_{\odot}3M black hole with her body aligned in a radial direction, and (iii) the same as (ii) but for a 10 6 M 10 6 M 10^(6)M_(o.)10^{6} M_{\odot}106M black hole.

Part I

Geometry and mechanics in flat spacetime

In this introductory part of the book, we trace the development of the picture of the Universe which underpins relativity.
  • Based upon the principle that light travels at c c ccc in all inertial frames, we describe the geometry of spacetime in Chapter 1 and show that the consequences that stem from this are surprisingly far-reaching.
  • In Chapter 2, we show how vectors are treated in special relativity and how the dynamics of particles in flat spacetime can be obtained from the principle of least action.
  • Chapter 3 is concerned with coordinates. Sometimes we choose a geometric, coordinate-free approach, but often we have to choose a particular coordinate system. We consider Cartesian and nonCartesian bases and how to transform from one to the other.
  • We introduce tensors in Chapter 4, describing them as 'linear slot machines' into which you insert a number of vectors and their dual objects which are called 1 -forms; the slot machine then spits out a number. Vectors and 1 -forms are themselves both tensors, as is the energy-momentum tensor which we also introduce.
  • In Chapter 5, we consider a very special tensor: the metric tensor. The metric tensor encodes information about the spacetime, how distance is measured and also whether the spacetime is curved.

1

1.1 A common sense start 12
1.2 The speed of light 13
1.3 Light cones and the Lorentz transformation
1.5 Experiments
Chapter summary
Exercises
Fig. 1.1 Spacetime diagram for our naive conception of the past, present and future. In particular, the present 'now' is represented by a horizontal line.
1 1 ^(1){ }^{1}1 There might need to be a few calculations made to correct for light-travel-time-effects (estimating the time delay in getting signals from you and your aunt to the space station), but after doing this it will make perfect sense for everyone to talk about those two sandwich-biting events occurring at precisely the same instant.
2 2 ^(2){ }^{2}2 Although for the latter case we will need to be sent a signal of when the train left Paris, and will have to make a correction for the time taken for the signal to get to us.

Special relativity

Nowadays most people die of a sort of creeping common sense, and discover when it is too late that the only things one never regrets are one's mistakes.
Oscar Wilde (1854-1900) The picture of Dorian Gray

1.1 A common sense start

Einstein revolutionized our thinking about reality. To appreciate why, let's start with confirming some basic, obvious notions that would be selfevident to anyone who hadn't been exposed to Einstein's ideas. These are so straightforward that they might seem unnecessary to state, but we will do so because they turn out, in fact, to be wrong.
(1) The notion of now: For a start, we all understand how time rolls on inexorably for all of us. We all live in 'now', we leave the 'past' behind, and march into the 'future'. This is something we all experience, and as we look out of the window we see what others in the world are doing right now. Of course, if we train our telescopes on a distant galaxy, we might be observing it as it was, several billion years ago. But that's just a light-travel-time-effect. We can sensibly talk about what the inhabitants of the Andromeda galaxy might be doing right now, even if we can't see them. We could draw a spacetime diagram of this picture of reality and it would look like the one in Fig. 1.1.
(2) The notion of simultaneity: Because time is a quantity that we all experience identically (we all march to the same beat of the drum) you can make statements about simultaneity, such as 'at the exact same moment that I took my first bite of the sandwich in London, my aunt in Melbourne took the first bite of her sandwich'. We expect this statement to be universally true, agreed upon by all observers, so that if it is true for you and your aunt, it will be true for an observer of whatever standpoint (even if they are on the international space station). 1 1 ^(1){ }^{1}1
(3) Time intervals: Next, if we measure the time that something lasts, like a particular journey from Paris to Strasbourg, then we will get the same answer whether we are on the train or standing at Strasbourg station. 2 2 ^(2){ }^{2}2 Moreover, the rate at which time elapses surely doesn't depend on your altitude above sea level. It would be ridiculous for time to go at a different rate on the top floor of a building
than at the basement. Time intervals are therefore something that everyone can agree on, irrespective of their frame of reference.
(4) Spatial intervals: Moreover, intervals in space are similarly universal. If you measure the length of a moving train carriage as a passenger you should get the same answer as an observer standing on the station platform. 3 3 ^(3){ }^{3}3 Again, completely self-evident.
These concepts are all intuitively obvious. It was Einstein's particular genius to understand that, amazingly, our 'common sense' intuition is at fault and that these supposedly self-evident concepts must be abandoned.

1.2 The speed of light

By the start of the twentieth century, physicists were faced with a series of rather profound questions about how light propagates that put many accepted notions of physics at risk. 4 4 ^(4){ }^{4}4 Einstein was motivated by wanting to save Maxwell's equations of electromagnetism which showed that the speed of light, c c ccc, could be related to electric and magnetic constants via the famous equation linking c c ccc to free space's permittivity ϵ 0 ϵ 0 epsilon_(0)\epsilon_{0}ϵ0 and permeability μ 0 μ 0 mu_(0)\mu_{0}μ0
(1.1) c = 1 ϵ 0 μ 0 . (1.1) c = 1 ϵ 0 μ 0 . {:(1.1)c=(1)/(sqrt(epsilon_(0)mu_(0))).:}\begin{equation*} c=\frac{1}{\sqrt{\epsilon_{0} \mu_{0}}} . \tag{1.1} \end{equation*}(1.1)c=1ϵ0μ0.
But speed is a relative quantity. A car travels at 50 miles per hour with respect to the road. What does light travel with respect to? If you are travelling at speed c / 2 c / 2 c//2c / 2c/2 with respect to a laser which emits a beam of light travelling in opposite direction to you, does the light travel with respect to you at a speed c 2 ( c ) = 3 c 2 c 2 ( c ) = 3 c 2 (c)/(2)-(-c)=(3c)/(2)\frac{c}{2}-(-c)=\frac{3 c}{2}c2(c)=3c2 ? If you then measured the speed of light to be 3 c 2 3 c 2 (3c)/(2)\frac{3 c}{2}3c2, how could you reconcile that with Maxwell's equations? Einstein concluded that eqn 1.1 was a universal concept and that the speed of light was the same for all observers in all inertial reference frames. 5 5 ^(5){ }^{5}5 The consequences of this bold assumption on spacetime geometry are far-reaching. Before we get to these, let's start with a review of some notions of ordinary geometry.
Example 1.1
The two-dimensional x y x y xyx yxy plane is shown in Fig. 1.2(a). The point ( x , y ) ( x , y ) (x,y)(x, y)(x,y) is a distance d = x 2 + y 2 d = x 2 + y 2 d=sqrt(x^(2)+y^(2))d=\sqrt{x^{2}+y^{2}}d=x2+y2 from the origin. If we rotate the coordinates [Fig. 1.2(b)] so that x x x x x rarrx^(')x \rightarrow x^{\prime}xx and y y y y y rarry^(')y \rightarrow y^{\prime}yy, we want this distance to be unchanged, so that x 2 + y 2 = x 2 + y 2 x 2 + y 2 = x 2 + y 2 x^(2)+y^(2)=x^('2)+y^('2)x^{2}+y^{2}=x^{\prime 2}+y^{\prime 2}x2+y2=x2+y2. A linear transform that accomplishes this is given by
(1.2) ( x y ) = ( cos θ sin θ sin θ cos θ ) ( x y ) (1.2) ( x y ) = cos θ sin θ sin θ cos θ ( x y ) {:(1.2)((x^('))/(y^(')))=([cos theta,sin theta],[-sin theta,cos theta])((x)/(y)):}\binom{x^{\prime}}{y^{\prime}}=\left(\begin{array}{cc} \cos \theta & \sin \theta \tag{1.2}\\ -\sin \theta & \cos \theta \end{array}\right)\binom{x}{y}(1.2)(xy)=(cosθsinθsinθcosθ)(xy)
which works because cos 2 θ + sin 2 θ = 1 cos 2 θ + sin 2 θ = 1 cos^(2)theta+sin^(2)theta=1\cos ^{2} \theta+\sin ^{2} \theta=1cos2θ+sin2θ=1. The matrix in this equation is known as a rotation matrix. If you ask what are the set of points which are equidistant from the origin then, obviously, you will end up with concentric circles centred on the origin. The shortest distance between the origin and a point ( x , y ) ( x , y ) (x,y)(x, y)(x,y) is, of course, a straight line and that straight line will intersect with all of those circles at right angles.
3 3 ^(3){ }^{3}3 It is easier to make the measurement as a passenger on the train (just run a very long tape measure from one end to the other). On the platform, you would need to measure where the front of the moving train and the back of the train are at some simultaneous instant. Harder to do in practice, but perfectly possible in principle. You would naively expect to get the same answer in both cases.
4 4 ^(4){ }^{4}4 Specifically, the assumption that light was a mechanical wave propagating in an ether raised several troubling issues. See the book by Cheng (Appendix A) for the history.
5 5 ^(5){ }^{5}5 Reminder: An inertial reference frame, or inertial frame, is a reference frame that is not accelerating.

Fig. 1.2 The x y x y xyx yxy plane. The distance between a point ( x , y ) ( x , y ) (x,y)(x, y)(x,y) and the origin is d d ddd and is (of course) independent of whether the coordinates used are (a) x x xxx and y y yyy or (b) the rotated x x x^(')x^{\prime}x and y y y^(')y^{\prime}y.
Fig. 1.3 A light source flashes at the origin at t = 0 t = 0 t=0t=0t=0 and a spherical wave front, with radius c t c t ctc tct, expands outward.
6 6 ^(6){ }^{6}6 We refer to points in spacetime as events. An event is something which happens at a particular place and particular time: a photon is emitted, a ticular time: a photon is emitted, a photon is absorbed, a gun is fired, a
balloon bursts. Each event is characterized by a single point in spacetime.
7 7 ^(7){ }^{7}7 By d x 2 x 2 x^(2)x^{2}x2 we really mean ( d x ) 2 ( d x ) 2 (dx)^(2)(\mathrm{d} x)^{2}(dx)2, but we write this so often the convention is to leave out the brackets to save on notational clutter.
8 8 ^(8){ }^{8}8 We will work with d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2, rather than taking the square root, in order to avoid dealing with square roots of negative numbers. For brevity, people sometimes refer to the square of the inter val d s 2 val d s 2 valds^(2)\mathrm{val} \mathrm{d} s^{2}valds2 as simply 'the interval' (even though strictly the term refers only to though strictly the term refers only to
d s d s ds\mathrm{d} sds ). Note also that in quantum field d s d s ds\mathrm{d} sds ). Note also that in quantum field
theory it is conventional to define d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 theory it is conventional to define d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2
with the opposite sign to what we have done here, i.e. to write
d s 2 c 2 d t 2 d x 2 d y 2 d z 2 d s 2 c 2 d t 2 d x 2 d y 2 d z 2 ds^(2)-=c^(2)dt^(2)-dx^(2)-dy^(2)-dz^(2)\mathrm{d} s^{2} \equiv c^{2} \mathrm{~d} t^{2}-\mathrm{d} x^{2}-\mathrm{d} y^{2}-\mathrm{d} z^{2}ds2c2 dt2dx2dy2dz2
and indeed we have done so in our own Quantum Field Theory for the Gifted Amateur. In this book, we adopt the convention used by most textbooks on general relativity. One might wish there was a common convention between the two fields, but this is the price you pay for exploring a number of topics in physics. It's the same with international motoring: you have to get used to driving on both the left and the right.
9 9 ^(9){ }^{9}9 This follows from the fact, discussed above, that if d s = 0 d s = 0 ds=0\mathrm{d} s=0ds=0 in one inertial frame, then d s = 0 d s = 0 ds^(')=0\mathrm{d} s^{\prime}=0ds=0 in any other system.
10 10 ^(10){ }^{10}10 That is, v 12 = | v 1 v 2 | v 12 = v 1 v 2 v_(12)=| vec(v)_(1)- vec(v)_(2)|v_{12}=\left|\vec{v}_{1}-\vec{v}_{2}\right|v12=|v1v2|.
Let's now consider not just space but spacetime. As in the previous example, we want to have some notion of length which is unchanged under a rotation in spacetime (whatever that might mean). How do we define a length? Einstein's postulate gives us a clue, because if a light source flashes at x = y = z = 0 x = y = z = 0 x=y=z=0x=y=z=0x=y=z=0 and t = 0 t = 0 t=0t=0t=0 it will send out a beam of light travelling at speed c c ccc in all directions. There will therefore be a spherical wave front (Fig. 1.3) defined by
(1.3) x 2 + y 2 + z 2 = c 2 t 2 (1.3) x 2 + y 2 + z 2 = c 2 t 2 {:(1.3)x^(2)+y^(2)+z^(2)=c^(2)t^(2):}\begin{equation*} x^{2}+y^{2}+z^{2}=c^{2} t^{2} \tag{1.3} \end{equation*}(1.3)x2+y2+z2=c2t2
Let's now consider two points in spacetime 6 6 ^(6){ }^{6}6 which are separated only by infinitesimal distances but connected by a light pulse, so that 7 7 ^(7){ }^{7}7
(1.4) d x 2 + d y 2 + d z 2 = c 2 d t 2 (1.4) d x 2 + d y 2 + d z 2 = c 2 d t 2 {:(1.4)dx^(2)+dy^(2)+dz^(2)=c^(2)dt^(2):}\begin{equation*} \mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}=c^{2} \mathrm{~d} t^{2} \tag{1.4} \end{equation*}(1.4)dx2+dy2+dz2=c2 dt2
Another way of writing this equation is to put the c 2 d t 2 c 2 d t 2 c^(2)dt^(2)c^{2} \mathrm{~d} t^{2}c2 dt2 on the lefthand side so that the quantity that we will call d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2, the square of the spacetime interval or invariant line element 8 8 ^(8){ }^{8}8 is
(1.5) d s 2 c 2 d t 2 + d x 2 + d y 2 + d z 2 = 0 (1.5) d s 2 c 2 d t 2 + d x 2 + d y 2 + d z 2 = 0 {:(1.5)ds^(2)-=-c^(2)dt^(2)+dx^(2)+dy^(2)+dz^(2)=0:}\begin{equation*} \mathrm{d} s^{2} \equiv-c^{2} \mathrm{~d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}=0 \tag{1.5} \end{equation*}(1.5)ds2c2 dt2+dx2+dy2+dz2=0
This has been written using the coordinates of some inertial frame we can call S S SSS. In another inertial frame S S S^(')S^{\prime}S our coordinates will change, but Einstein insists that light travels at the same speed in all inertial frames and so the interval between the same two events is given by
(1.6) d s 2 c 2 d t 2 + d x 2 + d y 2 + d z 2 = 0 (1.6) d s 2 c 2 d t 2 + d x 2 + d y 2 + d z 2 = 0 {:(1.6)ds^('2)-=-c^(2)dt^('2)+dx^('2)+dy^('2)+dz^('2)=0:}\begin{equation*} \mathrm{d} s^{\prime 2} \equiv-c^{2} \mathrm{~d} t^{\prime 2}+\mathrm{d} x^{\prime 2}+\mathrm{d} y^{\prime 2}+\mathrm{d} z^{\prime 2}=0 \tag{1.6} \end{equation*}(1.6)ds2c2 dt2+dx2+dy2+dz2=0
or d s 2 = d s 2 d s 2 = d s 2 ds^(2)=ds^('2)\mathrm{d} s^{2}=\mathrm{d} s^{\prime 2}ds2=ds2. Remarkably, we can now show in the following example that the spacetime interval is the same in all inertial frames, even if the two points are not connected by a light pulse.

Example 1.2

For intervals separated by infinitesimal distances in space and time, d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 and d s 2 d s 2 ds^('2)\mathrm{d} s^{\prime 2}ds2 can be related using some function 9 a ( v ) 9 a ( v ) ^(9)a(v){ }^{9} a(v)9a(v) by d s 2 = a ( v ) d s 2 d s 2 = a ( v ) d s 2 ds^(2)=a(v)ds^('2)\mathrm{d} s^{2}=a(v) \mathrm{d} s^{\prime 2}ds2=a(v)ds2. Note that a a aaa can't be a function of position or time without violating the principle of the homogeneity of spacetime (every point in spacetime is like any other point). The function a can depend on the velocity v v vec(v)\vec{v}v between frames S S SSS and S S S^(')S^{\prime}S, but can't depend on the direction
depetime depend on the velocity v v vec(v)\vec{v}v between frames S S SSS and S S SSS, but can't depend on the direction
of v v vec(v)\vec{v}v, only on its magnitude v = | v | v = | v | v=| vec(v)|v=|\vec{v}|v=|v|, otherwise it would violate the principle of the isotropy of space (no special directions). Now consider three frames: S , S 1 S , S 1 S,S_(1)S, S_{1}S,S1, which moves at a speed v 1 v 1 v_(1)v_{1}v1 relative to S S SSS, and S 2 S 2 S_(2)S_{2}S2 which moves at a speed v 2 v 2 v_(2)v_{2}v2 relative to S S SSS. We have d s 2 = a ( v 1 ) d s 1 2 d s 2 = a v 1 d s 1 2 ds^(2)=a(v_(1))ds_(1)^(2)\mathrm{d} s^{2}=a\left(v_{1}\right) \mathrm{d} s_{1}^{2}ds2=a(v1)ds12 and d s 2 = a ( v 2 ) d s 2 2 d s 2 = a v 2 d s 2 2 ds^(2)=a(v_(2))ds_(2)^(2)\mathrm{d} s^{2}=a\left(v_{2}\right) \mathrm{d} s_{2}^{2}ds2=a(v2)ds22, but we must also have d s 1 2 = a ( v 12 ) d s 2 2 d s 1 2 = a v 12 d s 2 2 ds_(1)^(2)=a(v_(12))ds_(2)^(2)\mathrm{d} s_{1}^{2}=a\left(v_{12}\right) \mathrm{d} s_{2}^{2}ds12=a(v12)ds22, where v 12 v 12 v_(12)v_{12}v12 is the relative speed 10 10 ^(10){ }^{10}10 of S 1 S 1 S_(1)S_{1}S1 and S 2 S 2 S_(2)S_{2}S2. Comparing, we must have
(1.7) a ( v 12 ) = a ( v 2 ) a ( v 1 ) . (1.7) a v 12 = a v 2 a v 1 . {:(1.7)a(v_(12))=(a(v_(2)))/(a(v_(1))).:}\begin{equation*} a\left(v_{12}\right)=\frac{a\left(v_{2}\right)}{a\left(v_{1}\right)} . \tag{1.7} \end{equation*}(1.7)a(v12)=a(v2)a(v1).
However, v 12 = v 1 2 + v 2 2 2 v 1 v 2 cos θ v 12 = v 1 2 + v 2 2 2 v 1 v 2 cos θ v_(12)=sqrt(v_(1)^(2)+v_(2)^(2)-2v_(1)v_(2)cos theta)v_{12}=\sqrt{v_{1}^{2}+v_{2}^{2}-2 v_{1} v_{2} \cos \theta}v12=v12+v222v1v2cosθ where θ θ theta\thetaθ is the angle between v 1 v 1 vec(v)_(1)\vec{v}_{1}v1 and v 2 v 2 vec(v)_(2)\vec{v}_{2}v2. However, v 12 = v 1 2 + v 2 2 2 v 1 v 2 cos θ where θ is the angle between v 1 and v 2 v 12 = v 1 2 + v 2 2 2 v 1 v 2 cos θ  where  θ  is the angle between  v 1  and  v 2 v_(12)=sqrt(v_(1)^(2)+v_(2)^(2)-2v_(1)v_(2)cos theta" where "theta" is the angle between "v_(1)" and "v_(2))v_{12}=\sqrt{v_{1}^{2}+v_{2}^{2}-2 v_{1} v_{2} \cos \theta \text { where } \theta \text { is the angle between } v_{1} \text { and } v_{2}}v12=v12+v222v1v2cosθ where θ is the angle between v1 and v2. a ( v ) a ( v ) a(v)a(v)a(v) cannot depend on v v vvv and must be a constant (call it a a aaa ). However, eqn 1.7 now becomes a = a / a a = a / a a=a//aa=a / aa=a/a which is only true if a = 1 a = 1 a=1a=1a=1. Thus we conclude that, in general,
d s 2 = d s 2 d s 2 = d s 2 ds^(2)=ds^('2)\mathrm{d} s^{2}=\mathrm{d} s^{\prime 2}ds2=ds2

1.3 Light cones and the Lorentz transformation

This book is about general relativity in which spacetime can be curved, but for this chapter and the next three we will be considering the simplest case 11 11 ^(11){ }^{11}11 of flat spacetime, in which the geometry considered above extends over all space. Thus, we can consider not just infinitesimal intervals ds but also the interval Δ s Δ s Delta s\Delta sΔs between more distant points in spacetime, i.e. we can write
(1.9) Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 (1.9) Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 {:(1.9)Deltas^(2)-=-c^(2)Deltat^(2)+Deltax^(2)+Deltay^(2)+Deltaz^(2):}\begin{equation*} \Delta s^{2} \equiv-c^{2} \Delta t^{2}+\Delta x^{2}+\Delta y^{2}+\Delta z^{2} \tag{1.9} \end{equation*}(1.9)Δs2c2Δt2+Δx2+Δy2+Δz2
A sketch of a spacetime diagram near the origin for this flat spacetime is shown in Fig. 1.4. The set of points that satisfy Δ s 2 = 0 Δ s 2 = 0 Deltas^(2)=0\Delta s^{2}=0Δs2=0 are said to be on the light cone defined by eqn 1.5 since they can be connected to the origin by light rays. Points inside the light cone have Δ s 2 < 0 Δ s 2 < 0 Deltas^(2) < 0\Delta s^{2}<0Δs2<0 and can be connected to the origin by particles travelling less than the speed of light. The light cone actually contains two sections, a past light cone and a future light cone. Physical processes at the origin can be affected by anything on or within the past light cone and processes at the origin can affect anything on or within the future light cone. On the other hand, the set of points outside the light cone (which have Δ s 2 > 0 Δ s 2 > 0 Deltas^(2) > 0\Delta s^{2}>0Δs2>0 ) cannot be causally connected to the origin. We introduce some jargon for these three classes of interval:
(1.10) Spacelike separation Δ s 2 > 0 , Null separation Δ s 2 = 0 Timelike separation Δ s 2 < 0 (1.10)  Spacelike separation  Δ s 2 > 0 ,  Null separation  Δ s 2 = 0  Timelike separation  Δ s 2 < 0 {:(1.10){:[" Spacelike separation ",Deltas^(2) > 0","],[" Null separation ",Deltas^(2)=0],[" Timelike separation ",Deltas^(2) < 0]:}:}\begin{array}{ll} \text { Spacelike separation } & \Delta s^{2}>0, \\ \text { Null separation } & \Delta s^{2}=0 \tag{1.10}\\ \text { Timelike separation } & \Delta s^{2}<0 \end{array}(1.10) Spacelike separation Δs2>0, Null separation Δs2=0 Timelike separation Δs2<0

Example 1.3

Let's see how to transform between inertial frames. We shall deal with the x t x t xtx txt plane as shown in Fig. 1.5. The point ( x , t ) ( x , t ) (x,t)(x, t)(x,t) is now at an interval c 2 t 2 + x 2 c 2 t 2 + x 2 sqrt(-c^(2)t^(2)+x^(2))\sqrt{-c^{2} t^{2}+x^{2}}c2t2+x2 from the origin. The interval is somewhat like distance in Example 1.1, but the minus sign in the definition will change things. The analogue of rotating the coordinates, mapping x x x x x rarrx^(')x \rightarrow x^{\prime}xx and t t t t t rarrt^(')t \rightarrow t^{\prime}tt, which preserves the squared interval ( c 2 t 2 + x 2 = c 2 t 2 + x 2 ) c 2 t 2 + x 2 = c 2 t 2 + x 2 (-c^(2)t^(2)+x^(2)=-c^(2)t^('2)+x^('2))\left(-c^{2} t^{2}+x^{2}=-c^{2} t^{\prime 2}+x^{\prime 2}\right)(c2t2+x2=c2t2+x2), is the linear transform given by
(1.11) ( x c t ) = ( cosh θ sinh θ sinh θ cosh θ ) ( x c t ) , (1.11) ( x c t ) = cosh θ sinh θ sinh θ cosh θ ( x c t ) , {:(1.11)((x^('))/(ct^(')))=([cosh theta,sinh theta],[sinh theta,cosh theta])((x)/(ct))",":}\binom{x^{\prime}}{c t^{\prime}}=\left(\begin{array}{cc} \cosh \theta & \sinh \theta \tag{1.11}\\ \sinh \theta & \cosh \theta \end{array}\right)\binom{x}{c t},(1.11)(xct)=(coshθsinhθsinhθcoshθ)(xct),
which works because cosh 2 θ sinh 2 θ = 1 cosh 2 θ sinh 2 θ = 1 cosh^(2)theta-sinh^(2)theta=1\cosh ^{2} \theta-\sinh ^{2} \theta=1cosh2θsinh2θ=1. This is known as a Lorentz transformation. If frame S S S^(')S^{\prime}S moves 12 12 ^(12)^{12}12 at speed v β c v β c v-=beta cv \equiv \beta cvβc with respect to frame S S SSS, a particle located at a point in space which is stationary in S S SSS, is moving in S S S^(')S^{\prime}S at speed v v -v-vv. If we then set x = 0 x = 0 x=0x=0x=0 we have that x = c t sinh θ x = c t sinh θ x^(')=ct sinh thetax^{\prime}=c t \sinh \thetax=ctsinhθ and t = t cosh θ t = t cosh θ t^(')=t cosh thetat^{\prime}=t \cosh \thetat=tcoshθ, but x / t = v x / t = v x^(')//t^(')=-vx^{\prime} / t^{\prime}=-vx/t=v, so we deduce that v = c tanh θ v = c tanh θ v=-c tanh thetav=-c \tanh \thetav=ctanhθ, or equivalently β = tanh θ β = tanh θ beta=-tanh theta\beta=-\tanh \thetaβ=tanhθ. This means that with the definition
γ 1 / 2 = ( 1 β 2 ) 1 / 2 γ 1 / 2 = 1 β 2 1 / 2 gamma_(-1//2)=(1-beta^(2))^(-1//2)\underset{-1 / 2}{\gamma}=\left(1-\beta^{2}\right)^{-1 / 2}γ1/2=(1β2)1/2
(1.12)
we have that γ = ( 1 β 2 ) 1 / 2 = ( 1 tanh 2 θ ) 1 / 2 = cosh θ γ = 1 β 2 1 / 2 = 1 tanh 2 θ 1 / 2 = cosh θ gamma=(1-beta^(2))^(-1//2)=(1-tanh^(2)theta)^(-1//2)=cosh theta\gamma=\left(1-\beta^{2}\right)^{-1 / 2}=\left(1-\tanh ^{2} \theta\right)^{-1 / 2}=\cosh \thetaγ=(1β2)1/2=(1tanh2θ)1/2=coshθ and β γ = β γ = beta gamma=\beta \gamma=βγ= tanh θ cosh θ = sinh θ tanh θ cosh θ = sinh θ -tanh theta cosh theta=-sinh theta-\tanh \theta \cosh \theta=-\sinh \thetatanhθcoshθ=sinhθ. This puts the Lorentz transformation into the more familiar form 13 13 ^(13){ }^{13}13
(1.13) ( x c t ) = ( γ β γ β γ γ ) ( x c t ) (1.13) ( x c t ) = γ β γ β γ γ ( x c t ) {:(1.13)((x^('))/(ct^(')))=([gamma,-beta gamma],[-beta gamma,gamma])((x)/(ct)):}\binom{x^{\prime}}{c t^{\prime}}=\left(\begin{array}{cc} \gamma & -\beta \gamma \tag{1.13}\\ -\beta \gamma & \gamma \end{array}\right)\binom{x}{c t}(1.13)(xct)=(γβγβγγ)(xct)
11 11 ^(11){ }^{11}11 This is all that was envisaged in Einstein's 1905 paper on special relativity.
Fig. 1.4 A spacetime diagram near the origin, showing points which are spacelike ( Δ s 2 > 0 ) Δ s 2 > 0 (Deltas^(2) > 0)\left(\Delta s^{2}>0\right)(Δs2>0) and timelike ( Δ s 2 < 0 ) Δ s 2 < 0 (Deltas^(2) < 0)\left(\Delta s^{2}<0\right)(Δs2<0) separated from the origin. The set of points with Δ s 2 = 0 Δ s 2 = 0 Deltas^(2)=0\Delta s^{2}=0Δs2=0 are on the light cone.
Fig. 1.5 The x t x t xtx txt plane.
12 12 ^(12){ }^{12}12 We define the quantity β β beta\betaβ using
β = v c β = v c beta=(v)/(c)\beta=\frac{v}{c}β=vc
13 13 ^(13){ }^{13}13 Writing this out in components as
x = γ ( x v t ) , t = γ ( t v x c 2 ) x = γ ( x v t ) , t = γ t v x c 2 {:[x^(')=gamma(x-vt)","],[t^(')=gamma(t-(vx)/(c^(2)))]:}\begin{array}{r} x^{\prime}=\gamma(x-v t), \\ t^{\prime}=\gamma\left(t-\frac{v x}{c^{2}}\right) \end{array}x=γ(xvt),t=γ(tvxc2)
14 14 ^(14){ }^{14}14 In general it preserves the square of the interval c 2 t 2 + x 2 + y 2 + z 2 c 2 t 2 + x 2 + y 2 + z 2 -c^(2)t^(2)+x^(2)+y^(2)+z^(2)-c^{2} t^{2}+x^{2}+y^{2}+z^{2}c2t2+x2+y2+z2 in all frames, but here we are just considering one spatial dimension.
15 15 ^(15){ }^{15}15 The proof is straightforward. Setting Δ t = 0 Δ t = 0 Deltat^(')=0\Delta t^{\prime}=0Δt=0 in eqn 1.13 gives β = β = beta=\beta=β= c Δ t / Δ x c Δ t / Δ x -c Delta t//Delta x-c \Delta t / \Delta xcΔt/Δx which only has a sensible solution ( | β | < 1 ) ( | β | < 1 ) (|beta| < 1)(|\beta|<1)(|β|<1) for spacelike intervals.
16 16 ^(16){ }^{16}16 We are therefore forced to drop our common-sense points 1 and 2 from the start of the chapter
Fig. 1.6 A spacetime diagram in special relativity leads to a notion in which cial relativity leads to a notion in which
the region of spacetime outside the light cone is an extended present.
17 17 ^(17){ }^{17}17 This also kills off common-sense point 4.
Fig. 1.7 The quantity γ = ( 1 β 2 ) 1 / 2 γ = 1 β 2 1 / 2 gamma=(1-beta^(2))^(-1//2)\gamma=\left(1-\beta^{2}\right)^{-1 / 2}γ=(1β2)1/2 as a function of β = v / c β = v / c beta=v//c\beta=v / cβ=v/c.
The Lorentz transformation preserves the square of the interval 14 14 ^(14){ }^{14}14 c 2 t 2 + x 2 c 2 t 2 + x 2 -c^(2)t^(2)+x^(2)-c^{2} t^{2}+x^{2}c2t2+x2 in all frames. Let's now state a couple of important consequences of this transformation for different types of intervals.
(1) For any two points separated by any spacelike interval, one can find a reference frame 15 15 ^(15){ }^{15}15 for which their separation Δ t = 0 Δ t = 0 Deltat^(')=0\Delta t^{\prime}=0Δt=0, i.e. the two events separated by that interval occur simultaneously. Therefore, one can think of the set of points outside the light cone as an extended present, a region of spacetime which is not causally connected to the origin but is potentially simultaneous to it (in some reference frame). We now realize that our notion of 'now' is not a horizontal plane in spacetime as in Fig. 1.1 but forms everything outside the light cone (see Fig. 1.6). Strangely we have access to our past and our future, but it is the extended present, the 'now', which we have no access to! Our notions of simultaneity have been dramatically altered. 16 16 ^(16){ }^{16}16
Example 1.4
Spacelike intervals can be measured using rulers. A ruler is a device for measuring a spacelike length Δ x Δ x Delta x\Delta xΔx. (Length being the difference in two spatial coordinates evaluated spacelike length Δ x Δ x Delta x\Delta xΔx. (Length being the difference in two spatial coordinates evaluated
at the same value of the time coordinate.) If the ruler is stationary in frame S S S^(')S^{\prime}S and at the same value of the time coordinate.) If the ruler is stationary in frame S S SSS and
has length L L LLL then it doesn't matter when you measure the location of its two ends. has length L L LLL then it doesn't matter when you measure the location of its two ends.
If the ruler is moving then it can still be used to measure distances but it is then critical you measure its two ends at the same time. Thus eqn 1.13 yields
(1.14) ( L c Δ t ) = ( γ β γ β γ γ ) ( Δ x 0 ) (1.14) ( L c Δ t ) = γ β γ β γ γ ( Δ x 0 ) {:(1.14)((L)/(c Deltat^(')))=([gamma,-beta gamma],[-beta gamma,gamma])((Delta x)/(0)):}\binom{L}{c \Delta t^{\prime}}=\left(\begin{array}{cc} \gamma & -\beta \gamma \tag{1.14}\\ -\beta \gamma & \gamma \end{array}\right)\binom{\Delta x}{0}(1.14)(LcΔt)=(γβγβγγ)(Δx0)
and hence Δ x = L / γ Δ x = L / γ Delta x=L//gamma\Delta x=L / \gammaΔx=L/γ and the moving ruler is shorter than it is in its rest frame (remember, γ 1 γ 1 gamma >= 1\gamma \geq 1γ1; see Fig. 1.7). This effect is known as Lorentz contraction. The ruler's length when it is stationary, L L LLL, is called the rest length or proper length. 17 17 ^(17){ }^{17}17
(2) For two points separated by any timelike interval (which has negative Δ s 2 Δ s 2 Deltas^(2)\Delta s^{2}Δs2 ), the straight-line path between those two points represents the longest 'distance' (i.e. interval) between them, so that small deviations from this path result in a shorter interval. This surprising result is related to the famous twin paradox and we will explore this in Example 1.6. Before that, we will explain how time is measured in special relativity.
Example 1.5
For a timelike interval Δ s 2 < 0 Δ s 2 < 0 Deltas^(2) < 0\Delta s^{2}<0Δs2<0 it is helpful to define a real quantity Δ τ Δ τ Delta tau\Delta \tauΔτ (with units of time) by
(1.15) Δ τ 2 = Δ s 2 c 2 (1.15) Δ τ 2 = Δ s 2 c 2 {:(1.15)Deltatau^(2)=-(Deltas^(2))/(c^(2)):}\begin{equation*} \Delta \tau^{2}=-\frac{\Delta s^{2}}{c^{2}} \tag{1.15} \end{equation*}(1.15)Δτ2=Δs2c2
We call τ τ tau\tauτ the proper time because it yields the time in the rest frame of a particular particle; it is measured using a clock in that reference frame. In a general frame, we define the interval by eqn 1.9 ( Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 ) 1.9 Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 1.9(Deltas^(2)-=-c^(2)Deltat^(2)+Deltax^(2)+Deltay^(2)+Deltaz^(2))1.9\left(\Delta s^{2} \equiv-c^{2} \Delta t^{2}+\Delta x^{2}+\Delta y^{2}+\Delta z^{2}\right)1.9(Δs2c2Δt2+Δx2+Δy2+Δz2), but by the invariance of the interval then
(1.16) Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 = c 2 Δ τ 2 (1.16) Δ s 2 c 2 Δ t 2 + Δ x 2 + Δ y 2 + Δ z 2 = c 2 Δ τ 2 {:(1.16)Deltas^(2)-=-c^(2)Deltat^(2)+Deltax^(2)+Deltay^(2)+Deltaz^(2)=-c^(2)Deltatau^(2):}\begin{equation*} \Delta s^{2} \equiv-c^{2} \Delta t^{2}+\Delta x^{2}+\Delta y^{2}+\Delta z^{2}=-c^{2} \Delta \tau^{2} \tag{1.16} \end{equation*}(1.16)Δs2c2Δt2+Δx2+Δy2+Δz2=c2Δτ2
where τ τ tau\tauτ measures the time elapsed in the rest frame. Moving back to infinitesimal changes we can use eqn 1.16 to show that
d τ = [ ( d t ) 2 ( d x ) 2 + ( d y ) 2 + ( d x ) 2 c 2 ] 1 / 2 = d t { 1 1 c 2 [ ( d x d t ) 2 + ( d y d t ) 2 + ( d z d t ) 2 ] } 1 2 (1.17) = d t γ d τ = ( d t ) 2 ( d x ) 2 + ( d y ) 2 + ( d x ) 2 c 2 1 / 2 = d t 1 1 c 2 d x d t 2 + d y d t 2 + d z d t 2 1 2 (1.17) = d t γ {:[dtau=[(dt)^(2)-((dx)^(2)+(dy)^(2)+(dx)^(2))/(c^(2))]^(1//2)],[=dt{1-(1)/(c^(2))*[(((d)x)/((d)t))^(2)+((dy)/((d)t))^(2)+((dz)/((d)t))^(2)]}^((1)/(2))],[(1.17)=(dt)/(gamma)]:}\begin{align*} \mathrm{d} \tau & =\left[(\mathrm{d} t)^{2}-\frac{(\mathrm{d} x)^{2}+(\mathrm{d} y)^{2}+(\mathrm{d} x)^{2}}{c^{2}}\right]^{1 / 2} \\ & =\mathrm{d} t\left\{1-\frac{1}{c^{2}} \cdot\left[\left(\frac{\mathrm{~d} x}{\mathrm{~d} t}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} t}\right)^{2}+\left(\frac{\mathrm{d} z}{\mathrm{~d} t}\right)^{2}\right]\right\}^{\frac{1}{2}} \\ & =\frac{\mathrm{d} t}{\gamma} \tag{1.17} \end{align*}dτ=[(dt)2(dx)2+(dy)2+(dx)2c2]1/2=dt{11c2[( dx dt)2+(dy dt)2+(dz dt)2]}12(1.17)=dtγ
This demonstrates an effect known as time dilation, 18 18 ^(18){ }^{18}18 showing that the time elapsed between two events is longest in the rest frame of a clock. This effect is sometimes remembered using the slogan 'moving clocks run slow'. This phrase sometimes causes confusion. Clocks run in their rest frames at a particular rate; it's just when viewed from reference frames in which the clocks are moving is it deduced that the clocks are slowed down. 19 19 ^(19){ }^{19}19
For any deviation from the straight-line path the elapsed time will be shorter because additional segments of spatial-like motion will reduce the value of the elapsed time. We can treat this in general using eqn 1.17 by writing the time elapsed τ τ tau\tauτ along a path in spacetime (between two points α α alpha\alphaα and β β beta\betaβ ) as
τ = τ α τ β d τ = τ α τ β [ ( d t ) 2 ( d x ) 2 + ( d y ) 2 + ( d x ) 2 c 2 ] 1 / 2 = t α t β d t { 1 1 c 2 [ ( d x d t ) 2 + ( d y d t ) 2 + ( d z d t ) 2 ] } 1 / 2 (1.18) = t α t β d t γ ( t ) , where 20 γ ( t ) = [ 1 v 2 ( t ) / c 2 ] 1 / 2 . τ = τ α τ β d τ = τ α τ β ( d t ) 2 ( d x ) 2 + ( d y ) 2 + ( d x ) 2 c 2 1 / 2 = t α t β d t 1 1 c 2 d x d t 2 + d y d t 2 + d z d t 2 1 / 2 (1.18) = t α t β d t γ ( t )  where  20 γ ( t ) = 1 v 2 ( t ) / c 2 1 / 2 {:[tau=int_(tau_(alpha))^(tau_(beta))dtau=int_(tau_(alpha))^(tau_(beta))[(dt)^(2)-((dx)^(2)+(dy)^(2)+(dx)^(2))/(c^(2))]^(1//2)],[=int_(t_(alpha))^(t_(beta))dt{1-(1)/(c^(2))[(((d)x)/((d)t))^(2)+((dy)/((d)t))^(2)+((dz)/((d)t))^(2)]}^(1//2)],[(1.18)=int_(t_(alpha))^(t_(beta))(dt)/(gamma(t))", "],[" where "^(20)gamma(t)=[1-v^(2)(t)//c^(2)]^(-1//2)". "]:}\begin{align*} & \tau=\int_{\tau_{\alpha}}^{\tau_{\beta}} \mathrm{d} \tau=\int_{\tau_{\alpha}}^{\tau_{\beta}}\left[(\mathrm{d} t)^{2}-\frac{(\mathrm{d} x)^{2}+(\mathrm{d} y)^{2}+(\mathrm{d} x)^{2}}{c^{2}}\right]^{1 / 2} \\ & =\int_{t_{\alpha}}^{t_{\beta}} \mathrm{d} t\left\{1-\frac{1}{c^{2}}\left[\left(\frac{\mathrm{~d} x}{\mathrm{~d} t}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} t}\right)^{2}+\left(\frac{\mathrm{d} z}{\mathrm{~d} t}\right)^{2}\right]\right\}^{1 / 2} \\ & =\int_{t_{\alpha}}^{t_{\beta}} \frac{\mathrm{d} t}{\gamma(t)} \text {, } \tag{1.18}\\ & \text { where }{ }^{20} \gamma(t)=\left[1-v^{2}(t) / c^{2}\right]^{-1 / 2} \text {. } \end{align*}τ=τατβdτ=τατβ[(dt)2(dx)2+(dy)2+(dx)2c2]1/2=tαtβdt{11c2[( dx dt)2+(dy dt)2+(dz dt)2]}1/2(1.18)=tαtβdtγ(t) where 20γ(t)=[1v2(t)/c2]1/2
Example 1.6
The ideas from the last example can be used to resolve the famous twin paradox. 21 21 ^(21){ }^{21}21 Consider two twins A and B whose clocks are synchronized. Twin A remains on Earth, while twin B is briefly accelerated to speed v v vvv and travels to Proxima Centauri at a distance x x x^(**)x^{*}x from Earth (journey time x / v x / v x^(**)//vx^{*} / vx/v in A's frame). B then is briefly deaccelerated and made to return home with velocity v v -v-vv (arriving home after a total journey time of 2 x / v 2 x / v 2x^(**)//v2 x^{*} / v2x/v in A's frame). Both twins age at the same rate, according to their own individual clocks. However, when they meet at the end of the B's journey they find that twin A has aged more than twin B. From A's perspective, B's clock runs slow (time dilation), so that one hour experienced by B is γ > 1 γ > 1 gamma > 1\gamma>1γ>1 hours for A. But, couldn't B argue that from their perspective it was B that remained stationary and A did all the travelling? The resolution is A and B do not have identical experiences; while A has remained at rest in a single inertial frame, B has not, as the accelerometer in B's spacecraft will have recorded. Thus, there is no paradox because the situations are not symmetric. Because the interval is frame-independent, it suffices to work it out in A's frame (see Fig. 1.8). The straight-line path of A yields Δ s A 2 = c 2 ( 2 x / v ) 2 Δ s A 2 = c 2 2 x / v 2 Deltas_(A)^(2)=-c^(2)(2x^(**)//v)^(2)\Delta s_{\mathrm{A}}^{2}=-c^{2}\left(2 x^{*} / v\right)^{2}ΔsA2=c2(2x/v)2 corresponding to a total time of 2 x / v 2 x / v 2x^(**)//v2 x^{*} / v2x/v. The more circuitous path taken by B B BBB has two segments, each of which has Δ s B 2 = x 2 c 2 ( x / v ) 2 = c 2 ( x / v γ ) 2 Δ s B 2 = x 2 c 2 x / v 2 = c 2 x / v γ 2 Deltas_(B)^(2)=x^(**2)-c^(2)(x^(**)//v)^(2)=-c^(2)(x^(**)//v gamma)^(2)\Delta s_{\mathrm{B}}^{2}=x^{* 2}-c^{2}\left(x^{*} / v\right)^{2}=-c^{2}\left(x^{*} / v \gamma\right)^{2}ΔsB2=x2c2(x/v)2=c2(x/vγ)2, leading to a total time interval of 2 x / v γ 2 x / v γ 2x^(**)//v gamma2 x^{*} / v \gamma2x/vγ, which is indeed a factor of γ γ gamma\gammaγ down from A's time interval (as we deduced from appreciating that B's clocks run slow in A's frame). The fact that B's world line (the path through spacetime) appears longer than A's world line in Fig. 1.8, and yet takes less time to travel, is all due to the minus sign in the expression for the interval.
18 18 ^(18){ }^{18}18 The same result as in the previous example can also be obtained directly from the Lorentz transformation. Any timelike interval (involving Δ x , Δ t Δ x , Δ t Delta x,Delta t\Delta x, \Delta tΔx,Δt ) can be turned into one involving zero spatial distance using an appropriate Lorentz transformation into a frame with β = Δ x / c Δ t β = Δ x / c Δ t beta=-Delta x//c Delta t\beta=-\Delta x / c \Delta tβ=Δx/cΔt (so that Δ x = 0 Δ x = 0 Deltax^(')=0\Delta x^{\prime}=0Δx=0 with β = Δ x / c Δ t β = Δ x / c Δ t beta=-Delta x//c Delta t\beta=-\Delta x / c \Delta tβ=Δx/cΔt (so that Δ x = 0 Δ x = 0 Deltax^(')=0\Delta x^{\prime}=0Δx=0, using eqn 1.13) leading to Δ τ Δ t = Δ τ Δ t = Delta tau-=Deltat^(')=\Delta \tau \equiv \Delta t^{\prime}=ΔτΔt= ( Δ t ) / γ ( Δ t ) / γ (Delta t)//gamma(\Delta t) / \gamma(Δt)/γ (in agreement with eqn 1.17). The squared interval for this is then Δ s c 2 Δ t 2 γ 2 = Δ x 2 c 2 Δ t 2 = c 2 Δ τ 2 = Δ s c 2 Δ t 2 γ 2 = Δ x 2 c 2 Δ t 2 = c 2 Δ τ 2 = Deltas_(-(c^(2)Deltat^(2))/(gamma^(2)))^(=)Deltax^(2)-c^(2)Deltat^(2)=-c^(2)Deltatau^(2)=\Delta s_{-\frac{c^{2} \Delta t^{2}}{\gamma^{2}}}^{=} \Delta x^{2}-c^{2} \Delta t^{2}=-c^{2} \Delta \tau^{2}=Δsc2Δt2γ2=Δx2c2Δt2=c2Δτ2=
19 A 19 A ^(19)A{ }^{19} \mathrm{~A}19 A famous example is the cosmic ray muon which is generated in the upper atmosphere and makes it down to ground level. Muons have a lifetime of 2.2 μ s 2.2 μ s 2.2 mus2.2 \mu \mathrm{~s}2.2μ s in their rest frames which serves as their clock Even if they travelled t the speed of light, they should only the speed of ligh, they should only make it down from the atmosphere. The fact that many arrive on the ground is due to the time dilation effect; their clocks seem to be running slowly due to their high speed (large γ γ gamma\gammaγ ). In the muon's reference frame, the effect is due to the Lorentz contraction of the atmosphere 20 20 _(20){ }_{20}20 which is rushing towards it!
20 20 ^(20){ }^{20}20 Common-sense point 3 must now bite the dust too!
21 21 ^(21){ }^{21}21 The twin paradox, as we shall see, is only an apparent paradox.
Fig. 1.8 A spacetime diagram for the twin paradox.
Fig. 1.9 A path through spacetime.
22 22 ^(22){ }^{22}22 This clock may be a wristwatch, a radioactive source that measurably decays in activity as time increases, or may simply be the fact that the observer is slowly ageing.
Fig. 1.10 The function y ( x ) y ( x ) y(x)y(x)y(x) minimizes I. We consider small deviations from y ( x ) y ( x ) y(x)y(x)y(x) given by y ( x ) + ϵ η ( x ) y ( x ) + ϵ η ( x ) y(x)+epsilon eta(x)y(x)+\epsilon \eta(x)y(x)+ϵη(x) (dashed line) where η ( x ) η ( x ) eta(x)\eta(x)η(x) vanishes at x = a x = a x=ax=ax=a and x = b x = b x=bx=bx=b and ϵ ϵ epsilon\epsilonϵ is a small parameter.
23 23 ^(23){ }^{23}23 Though note that this condition will give us the extremal value of the integral, which could be a minimum or a maximum (or, and this becomes important in higher dimensions, a saddle point).

1.4 Paths through spacetime

We have seen that events are points in spacetime, and thus are instantaneous in time and localized in space. Events are witnessed and recorded by observers, who are each equipped with some kind of clock 22 22 ^(22){ }^{22}22 which tracks the time in the observer's reference frame (i.e. measures the observer's proper time). The path the observer takes through spacetime (Fig. 1.9) is a chain of events connected by infinitesimal timelike intervals (the observer's speed through spacetime has to be less than c c ccc ) and this path is known as the observer's world line.
We can now ask a simple question about paths through spacetime: what is the shortest distance between two points? This can be worked out using a technique in mathematics known as the calculus of variations and we review this in the following example for the simple case of usual flat (or Euclidean) space.

Example 1.7

In the calculus of variations, one deals with an integral of the form I = I = I=I=I= a b F ( y ( x ) , y ( x ) , x ) d x a b F y ( x ) , y ( x ) , x d x int_(a)^(b)F(y(x),y^(')(x),x)dx\int_{a}^{b} F\left(y(x), y^{\prime}(x), x\right) \mathrm{d} xabF(y(x),y(x),x)dx, where y = d y / d x y = d y / d x y^(')=dy//dxy^{\prime}=\mathrm{d} y / \mathrm{d} xy=dy/dx. We want to find the form of y ( x ) y ( x ) y(x)y(x)y(x) that minimizes I I III, while ensuring that y ( a ) y ( a ) y(a)y(a)y(a) and y ( b ) y ( b ) y(b)y(b)y(b) are fixed (see Fig. 1.10). The method assumes that you can make small variations to y ( x ) y ( x ) y(x)y(x)y(x) by adding a tiny bit of another function to it, so that
y ( x ) y ( x ) + ϵ η ( x ) y ( x ) y ( x ) + ϵ η ( x ) y(x)rarr y(x)+epsilon eta(x)y(x) \rightarrow y(x)+\epsilon \eta(x)y(x)y(x)+ϵη(x)
where ϵ ϵ epsilon\epsilonϵ is a small number and η ( x ) η ( x ) eta(x)\eta(x)η(x) must vanish at x = a x = a x=ax=ax=a and x = b x = b x=bx=bx=b. Then we look for the condition 23 23 ^(23){ }^{23}23
(1.20) d I d ϵ | ϵ = 0 = 0 for all η ( x ) (1.20) d I d ϵ ϵ = 0 = 0  for all  η ( x ) {:(1.20)(dI)/((d)epsilon)|_(epsilon=0)=0quad" for all "eta(x):}\begin{equation*} \left.\frac{\mathrm{d} I}{\mathrm{~d} \epsilon}\right|_{\epsilon=0}=0 \quad \text { for all } \eta(x) \tag{1.20} \end{equation*}(1.20)dI dϵ|ϵ=0=0 for all η(x)
We can then write
I = a b F ( y + ϵ η , y + ϵ η , x ) d x = a b F ( y , y , x ) d x + ϵ a b ( F y η + F y η ) d x + O ( ϵ 2 ) (1.21) = a b F ( y , y , x ) d x + ϵ δ I + O ( ϵ 2 ) I = a b F y + ϵ η , y + ϵ η , x d x = a b F y , y , x d x + ϵ a b F y η + F y η d x + O ϵ 2 (1.21) = a b F y , y , x d x + ϵ δ I + O ϵ 2 {:[I=int_(a)^(b)F(y+epsilon eta,y^(')+epsiloneta^('),x)dx],[=int_(a)^(b)F(y,y^('),x)dx+epsilonint_(a)^(b)((del F)/(del y)eta+(del F)/(dely^('))eta^('))dx+O(epsilon^(2))],[(1.21)=int_(a)^(b)F(y,y^('),x)dx+epsilon delta I+O(epsilon^(2))]:}\begin{align*} I & =\int_{a}^{b} F\left(y+\epsilon \eta, y^{\prime}+\epsilon \eta^{\prime}, x\right) \mathrm{d} x \\ & =\int_{a}^{b} F\left(y, y^{\prime}, x\right) \mathrm{d} x+\epsilon \int_{a}^{b}\left(\frac{\partial F}{\partial y} \eta+\frac{\partial F}{\partial y^{\prime}} \eta^{\prime}\right) \mathrm{d} x+O\left(\epsilon^{2}\right) \\ & =\int_{a}^{b} F\left(y, y^{\prime}, x\right) \mathrm{d} x+\epsilon \delta I+O\left(\epsilon^{2}\right) \tag{1.21} \end{align*}I=abF(y+ϵη,y+ϵη,x)dx=abF(y,y,x)dx+ϵab(Fyη+Fyη)dx+O(ϵ2)(1.21)=abF(y,y,x)dx+ϵδI+O(ϵ2)
and our condition will be satisfied if ϵ δ I = 0 ϵ δ I = 0 epsilon delta I=0\epsilon \delta I=0ϵδI=0 for all η ( x ) η ( x ) eta(x)\eta(x)η(x). One of the integrals can be done by parts
(1.22) a b F y η d x = [ F y η ] a b a b d d x ( F y ) η d x (1.22) a b F y η d x = F y η a b a b d d x F y η d x {:(1.22)int_(a)^(b)(del F)/(dely^('))eta^(')dx=[(del F)/(dely^('))eta]_(a)^(b)-int_(a)^(b)((d))/((d)x)((del F)/(dely^(')))etadx:}\begin{equation*} \int_{a}^{b} \frac{\partial F}{\partial y^{\prime}} \eta^{\prime} \mathrm{d} x=\left[\frac{\partial F}{\partial y^{\prime}} \eta\right]_{a}^{b}-\int_{a}^{b} \frac{\mathrm{~d}}{\mathrm{~d} x}\left(\frac{\partial F}{\partial y^{\prime}}\right) \eta \mathrm{d} x \tag{1.22} \end{equation*}(1.22)abFyηdx=[Fyη]abab d dx(Fy)ηdx
and the term in square brackets vanishes because η ( a ) = η ( b ) = 0 η ( a ) = η ( b ) = 0 eta(a)=eta(b)=0\eta(a)=\eta(b)=0η(a)=η(b)=0. Thus,
(1.23) δ I = a b [ F y d d x ( F y ) ] η ( x ) d x (1.23) δ I = a b F y d d x F y η ( x ) d x {:(1.23)delta I=int_(a)^(b)[(del F)/(del y)-(d)/((d)x)((del F)/(dely^(')))]eta(x)dx:}\begin{equation*} \delta I=\int_{a}^{b}\left[\frac{\partial F}{\partial y}-\frac{\mathrm{d}}{\mathrm{~d} x}\left(\frac{\partial F}{\partial y^{\prime}}\right)\right] \eta(x) \mathrm{d} x \tag{1.23} \end{equation*}(1.23)δI=ab[Fyd dx(Fy)]η(x)dx
and this will be zero for all η ( x ) η ( x ) eta(x)\eta(x)η(x) if we satisfy
(1.24) F y d d x ( F y ) = 0 (1.24) F y d d x F y = 0 {:(1.24)(del F)/(del y)-(d)/((d)x)((del F)/(dely^(')))=0:}\begin{equation*} \frac{\partial F}{\partial y}-\frac{\mathrm{d}}{\mathrm{~d} x}\left(\frac{\partial F}{\partial y^{\prime}}\right)=0 \tag{1.24} \end{equation*}(1.24)Fyd dx(Fy)=0
which is known as the Euler-Lagrange equation.
We can now apply this to the case of the shortest distance between two points in Euclidean space. The length \ell of a path between two points is given by
(1.25) = d s = ( d x ) 2 + ( d y ) 2 = 1 + y 2 d x = F ( y , x ) d x . (1.25) = d s = ( d x ) 2 + ( d y ) 2 = 1 + y 2 d x = F y , x d x . {:(1.25)ℓ=intds=intsqrt((dx)^(2)+(dy)^(2))=intsqrt(1+y^('2))dx=int F(y^('),x)dx.:}\begin{equation*} \ell=\int \mathrm{d} s=\int \sqrt{(\mathrm{d} x)^{2}+(\mathrm{d} y)^{2}}=\int \sqrt{1+y^{\prime 2}} \mathrm{~d} x=\int F\left(y^{\prime}, x\right) \mathrm{d} x . \tag{1.25} \end{equation*}(1.25)=ds=(dx)2+(dy)2=1+y2 dx=F(y,x)dx.
The integrand F F FFF is a function of y y y^(')y^{\prime}y, not y y yyy, and so F / y = 0 F / y = 0 del F//del y=0\partial F / \partial y=0F/y=0 and we can work out that F / y = y / 1 + y 2 F / y = y / 1 + y 2 del F//dely^(')=-y^(')//sqrt(1+y^('2))\partial F / \partial y^{\prime}=-y^{\prime} / \sqrt{1+y^{\prime 2}}F/y=y/1+y2. The Euler-Lagrange equation then gives d / d x ( y / 1 + y 2 ) = 0 d / d x y / 1 + y 2 = 0 d//dx(y^(')//sqrt(1+y^('2)))=0\mathrm{d} / \mathrm{d} x\left(y^{\prime} / \sqrt{1+y^{\prime 2}}\right)=0d/dx(y/1+y2)=0 which is solved by y = y = y^(')=y^{\prime}=y= constant (let's call it m m mmm ). The solution is then y = m x + c y = m x + c y=mx+cy=m x+cy=mx+c, where c c ccc is another constant, and so is evidently a straight line.
We can now use this technique for working out the shortest interval between two points in spacetime (in the special case of a single spatial dimension). The proper time elapsed along a path between two points is
(1.26) τ = d τ = ( d t ) 2 1 c 2 ( d x ) 2 = d t 1 1 c 2 ( d x d t ) 2 (1.26) τ = d τ = ( d t ) 2 1 c 2 ( d x ) 2 = d t 1 1 c 2 d x d t 2 {:(1.26)tau=intdtau=intsqrt((dt)^(2)-(1)/(c^(2))*(dx)^(2))=intdtsqrt(1-(1)/(c^(2))(((d)x)/((d)t))^(2)):}\begin{equation*} \tau=\int \mathrm{d} \tau=\int \sqrt{(\mathrm{d} t)^{2}-\frac{1}{c^{2}} \cdot(\mathrm{~d} x)^{2}}=\int \mathrm{d} t \sqrt{1-\frac{1}{c^{2}}\left(\frac{\mathrm{~d} x}{\mathrm{~d} t}\right)^{2}} \tag{1.26} \end{equation*}(1.26)τ=dτ=(dt)21c2( dx)2=dt11c2( dx dt)2
and so is a bit different from the Euclidean case. However, application of the Euler-Lagrange equation also gives a straight line solution. Here we have to remember that the Euler-Lagrange equation identifies a stationary solution and in this case the solution is a maximum, not a minimum. We can prove that very simply: consider a timelike interval between the origin ( 0 , 0 ) ( 0 , 0 ) (0,0)(0,0)(0,0) and the point ( x , t ) ( x , t ) (x,t)(x, t)(x,t). One can move to a frame 24 24 ^(24){ }^{24}24 in which this interval is purely along the time axis, whereupon it becomes ( 0 , t / γ ) ( 0 , t / γ ) (0,t//gamma)(0, t / \gamma)(0,t/γ). The straight-line path thus corresponds to an elapsed time of t / γ t / γ t//gammat / \gammat/γ. Any deviation from this straight line path will result in a shorter elapsed time because excursions along the x x xxx-axis carry a reduced elapsed proper time because d τ = ( d t ) 2 1 c 2 ( d x ) 2 . This is, of course, d τ = ( d t ) 2 1 c 2 ( d x ) 2 . This is, of course,  dtau=sqrt((dt)^(2)-(1)/(c^(2))(dx)^(2)". This is, of course, ")\mathrm{d} \tau=\sqrt{(\mathrm{d} t)^{2}-\frac{1}{c^{2}}(\mathrm{~d} x)^{2} \text {. This is, of course, }}dτ=(dt)21c2( dx)2. This is, of course,  the twin paradox all over again.
We will often parametrize paths using the proper time τ τ tau\tauτ as a way of recording how far along a path an observer has travelled. Of course, you can set the zero of proper time any way you wish, and you can measure time in units of seconds, hours, or months as you please. For this reason, any affine 25 25 ^(25){ }^{25}25 scaling of τ τ tau\tauτ will do. An affine transformation of τ τ tau\tauτ can be written as
(1.27) λ = a τ + b (1.27) λ = a τ + b {:(1.27)lambda=a tau+b:}\begin{equation*} \lambda=a \tau+b \tag{1.27} \end{equation*}(1.27)λ=aτ+b
where a a aaa and b b bbb are real numbers and our new affine parameter λ λ lambda\lambdaλ is just the proper time in different units with a different zero of time. We will have cause to use affine parameters later on when we tackle general relativity.

1.5 Experiments

In this chapter, we have outlined the consequences of Einstein's bold vision of 1905 that led to the formulation of special relativity. Why should we believe any of this? The answer is that this theory agrees spectacularly well with experiment, although the experiments were mostly all done after 1905. In this section, we briefly summarize some of these.
  • The speed of light is absolute and constant: The Michelson-Morley experiment (1887) demonstrated that the total time for light to
    24 A 24 A ^(24)A{ }^{24} \mathrm{~A}24 A frame moving with velocity x / t x / t -x//t-x / tx/t, so that γ γ gamma\gammaγ is given by [ 1 ( x / c t ) 2 ] 1 / 2 1 ( x / c t ) 2 1 / 2 [1-(x//ct)^(2)]^(-1//2)\left[1-(x / c t)^{2}\right]^{-1 / 2}[1(x/ct)2]1/2.
    25 25 ^(25){ }^{25}25 The word affine comes from the Latin affinis meaning 'related to' or connected with'.
    26 26 ^(26){ }^{26}26 Although justly lauded as a land mark experiment and known by Einstein, it is not clear that the MichelsonMorley experiment was a major influence on his thinking. The book by Cheng discusses the Einstein's motivations
    27 27 ^(27){ }^{27}27 A good example can be found in C. Braxmaier et al., Phys. Rev. Lett. 88, 010401 (2002).
    28 28 ^(28){ }^{28}28 C. W. Chou, D. B. Hume, T. Rosenband and D. J. Wineland, Science 329, 1630 (2010).
    29 A 29 A ^(29)A{ }^{29} \mathrm{~A}29 A review of recent results can be found in S. Liberati, Class. Quantum Grav. 30, 133001 (2013).
    traverse, in free space, a distance \ell and to return back again is independent of its direction. This was accomplished by allowing light to travel back and forth along two perpendicular arms of equal length in a Michelson interferometer. 26 26 ^(26){ }^{26}26 The Kennedy-Thorndike experiment (1932) was a modification in which the arms of the interferometer are of unequal length. This experiment shows the time for light to traverse a closed path is independent of not only the orientation of the apparatus but also its velocity. Modern versions 27 27 ^(27){ }^{27}27 of this experiment frequently use two lasers, one locked to a well-known transition (such as a molecular absorption line, with frequency ν ref ν ref  nu_("ref ")\nu_{\text {ref }}νref  ) and the other locked to a very stable FabryPérot reference cavity (with frequency ν cav = n c ( v ) / ( 2 ) ν cav  = n c ( v ) / ( 2 ) nu_("cav ")=nc(v)//(2ℓ)\nu_{\text {cav }}=n c(v) /(2 \ell)νcav =nc(v)/(2), where n n nnn is the mode number and \ell is the length of the cavity, the speed of light c ( v ) c ( v ) c(v)c(v)c(v) being allowed the possibility of depending on v v vvv ). The difference between these two frequencies is measured precisely and monitored over time (as the laboratory velocity changes as the Earth rotates around the Sun).
  • Time dilation does occur: The Ives-Stillwell experiment (1938) used the Doppler shift in light from a moving source (accelerated ions) to infer time dilation. Time dilation is also used to interpret the flux of cosmic muons, as discussed earlier, though modern experiments use muon beams in accelerators (and from the muons' perspective, where the accelerator beamline is Lorentz contracted, this demonstrates Lorentz contraction). Modern Ives-Stillwell-type experiments have used heavy ion storage rings and laser spectroscopy to improve precision. A particularly elegant version 28 28 ^(28){ }^{28}28 uses very slowly moving ions together with extremely accurate spectroscopy. Two optical clocks based on laser-cooled Al + Al + Al^(+)\mathrm{Al}^{+}Al+ions are operated but in one of them the Al + Al + Al^(+)\mathrm{Al}^{+}Al+ion is given a velocity by an applied static electric field. The frequency emitted by the two clocks can be measured (to an accuracy of 10 17 10 17 10^(-17)10^{-17}1017 ) and accurately compared, providing agreement with Einstein's theory even though the velocity of one of the ions is only a rather sluggish 10 ms 1 10 ms 1 ~~10ms^(-1)\approx 10 \mathrm{~ms}^{-1}10 ms1.
  • Lorentz invariance holds: Numerous experiments have been performed to test Lorentz invariance to a high level of precision. No significant departure from Lorentz invariance has yet been found. 29 29 ^(29){ }^{29}29
  • Relativity has been used for more than a century: This is not a good argument, as Newtonian physics had survived unscathed for more than two centuries but was eventually superseded. However, we still find that Newtonian physics still has a very wide domain of applicability (and we now understand the limits of that domain). Relativity may one day meet its match (and we expect an as-yet unformulated theory of quantum gravity will take its place), but it has so far proved reliable in the design and operation of particle accelerators, the understanding of phenomena in astrophysics, telecommunications, the space programme and condensed matter
    physics. The experiments we've described above have stringently tested many aspects of relativity, and we have now accumulated ample evidence that it works across many branches of physics.

Chapter summary

  • The speed of light is the same in all inertial frames. This has profound consequences for the nature of reality, including time dilation, length contraction, and a revolution in our notion of simultaneity and the meaning of the present.
  • In relativity we deal with events. The history of a particle, given in terms of events, forms its world line.
  • The square of the invariant interval d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 between two events will be identical, no matter which coordinate system is used to evaluate it.
  • The straight-line world line between two timelike separated points maximizes the interval. Deviations from this result in a smaller interval and hence elapsed time (which helps explain the twin paradox).
  • A light cone is defined by d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0.
  • The predictions of special relativity have been tested in detail and the theory is strongly supported by substantial experimental evidence.

Exercises

(1.1) Review the theory of special relativity and the derivations for the breakdown of simultaneity, the extended present, time dilation, the Lorentz contraction and the twin paradox. Give a critique of the 'common sense' statements in Section 1.1.
(1.2) The proper mean lifetime of a muon is 2.2 μ s 2.2 μ s 2.2 mus2.2 \mu \mathrm{~s}2.2μ s. Muons are formed in the upper atmosphere due to the collision of cosmic rays with molecules in the atmosphere. If such muons travel down to the Earth's surface with a speed of 0.995 c 0.995 c 0.995 c0.995 c0.995c, calculate their mean distance travelled before decaying (a) ignoring the effect of time dilation and (b) including the effect of time dilation.
(1.3) We would like to measure the interval Δ s Δ s Delta s\Delta sΔs between events p p ppp (on our world line) and q q qqq (not on our world line), using only a clock and a light pulse. To do this we emit a light pulse at event r r rrr which strikes event q q qqq and is reflected back, meeting our world line at event u u uuu. We measure the proper time interval between r r rrr and p p ppp, which we call τ 2 τ 2 tau_(2)\tau_{2}τ2, and the proper time interval between p p ppp and u u uuu, which we call τ 1 τ 1 tau_(1)\tau_{1}τ1. Show that Δ s 2 = c 2 τ 1 τ 2 Δ s 2 = c 2 τ 1 τ 2 Deltas^(2)=c^(2)tau_(1)tau_(2)\Delta s^{2}=c^{2} \tau_{1} \tau_{2}Δs2=c2τ1τ2.
(1.4) The quantity ( γ 1 ) ( γ 1 ) (gamma-1)(\gamma-1)(γ1) provides a measure of the difference between special-relativistic and Newtonian mechanics. What values of β β beta\betaβ are needed to obtain a value of ( γ 1 ) ( γ 1 ) (gamma-1)(\gamma-1)(γ1) equal to (a) 0.01 , (b) 0.1 , (c) 1 , (d) 10 , (e) 100 ?

2

2.1 Vectors

2.3 Examples of vectors

Exercises

Fig. 2.1 We can't draw spacetime very accurately since it has 3 + 1 3 + 1 3+13+13+1 dimensions but here is an attempt. In this diagram, the three spatial dimensions have been flattened, unceremoniously, into a plane (shaded). The path of a photon ( γ ) ( γ ) (gamma)(\gamma)(γ) is also shown.
1 1 ^(1){ }^{1}1 The convention used is to write the index as a superscript, i.e. it goes in the upstairs position. Keep an eye out for whether an index goes upstairs or downstairs because this will have a significance that we will explain later in this chapter.
2 2 ^(2){ }^{2}2 In other words, they depend on the reference frame used.

Vectors in flat spacetime

Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take Arms against a Sea of troubles ..
William Shakespeare (1564-1616) Hamlet (Act III, Scene I)
In special relativity, we are dealing with flat spacetime because gravity is ignored. Let's consider what kind of physical quantities might exist in such a spacetime (see Fig. 2.1). The first type we might think of is a scalar. A scalar is simply a number, and takes the same value in every inertial frame. It is thus said to be Lorentz invariant. Examples of scalars include the electric charge and rest mass of a particle.
The second type of quantity is the subject of this chapter: a vector. This quantity can be thought of geometrically as an arrow in spacetime. However, we might also wish to choose a particular reference frame and describe the components of this vector with respect to a particular basis. To do this we will need to specify a coordinate system in which to work. Because we are dealing (for now) with flat spacetime, a choice of coordinates made in one part of spacetime will work throughout the whole of spacetime. As we shall see later, this rather convenient property will not work in a curved spacetime, and there our coordinates will generally only apply locally. (In the same way, a local map of New York, printed on a two-dimensional sheet of paper, cannot be extended to the whole Earth because the planet is spherical.)

Example 2.1

We have seen that the basic currency of relativity is the event. Examples of events include the emission of a photon, receiving a photon, hearing a loud noise or being shot by an arrow. Events are witnessed and recorded by observers. The simplest class of events occurs directly at the point in space occupied by the observer carrying a clock. The observer assigns a time, as measured on their clock, to the event. Once we have coordinate frames at our disposal, we can record events that occur at different points in spacetime as well as the intervals that separate them. We record the events in terms of the position on the coordinate grid and the time on the clock at that position. Events can therefore be expressed in the coordinates 1 1 ^(1){ }^{1}1 of some frame
(2.1) x μ = ( x 0 , x 1 , x 2 , x 3 ) = ( c t , x , y , z ) (2.1) x μ = x 0 , x 1 , x 2 , x 3 = ( c t , x , y , z ) {:(2.1)x^(mu)=(x^(0),x^(1),x^(2),x^(3))=(ct","x","y","z):}\begin{equation*} x^{\mu}=\left(x^{0}, x^{1}, x^{2}, x^{3}\right)=(c t, x, y, z) \tag{2.1} \end{equation*}(2.1)xμ=(x0,x1,x2,x3)=(ct,x,y,z)
which will sometimes be written as x μ = ( c t , x ) x μ = ( c t , x ) x^(mu)=(ct, vec(x))x^{\mu}=(c t, \vec{x})xμ=(ct,x), using the notion of the 3-vector x x vec(x)\vec{x}x with spatial coordinates x i = ( x , y , z ) x i = ( x , y , z ) x^(i)=(x,y,z)x^{i}=(x, y, z)xi=(x,y,z), taken from the end of the alphabet. The location of an event in spacetime can be described by a 4 -vector x x x\boldsymbol{x}x considered as an arrow in spacetime (stretching, say, from the origin to the event). The particular coordinates x μ x μ x^(mu)x^{\mu}xμ relating to x x x\boldsymbol{x}x depend on the basis chosen. 2 2 ^(2){ }^{2}2
Note that in this chapter, and from now on unless otherwise indicated, we choose units such that c = 1 c = 1 c=1c=1c=1.
A vector isn't just any old collection of components. It is an object that has to transform appropriately under coordinate transformations. 3 3 ^(3){ }^{3}3 In flat spacetime, 4 -vectors are made from a timelike part and a spacelike part and are displayed in bold italics, so a position in spacetime is written as x x x\boldsymbol{x}x where x x x\boldsymbol{x}x has components x μ = ( t , x ) x μ = ( t , x ) x^(mu)=(t, vec(x))x^{\mu}=(t, \vec{x})xμ=(t,x). Components for 4 -vectors are given a Greek index, so for example x μ x μ x^(mu)x^{\mu}xμ where μ = 0 , 1 , 2 , 3 μ = 0 , 1 , 2 , 3 mu=0,1,2,3\mu=0,1,2,3μ=0,1,2,3. In the jargon, x 0 x 0 x^(0)x^{0}x0 is the timelike component, x 1 , x 2 x 1 , x 2 x^(1),x^(2)x^{1}, x^{2}x1,x2 and x 3 x 3 x^(3)x^{3}x3 are the spacelike components. The spacelike components themselves form a 3 -vector, whose components are given a Roman index such as x i x i x^(i)x^{i}xi, where i = 1 , 2 , 3 i = 1 , 2 , 3 i=1,2,3i=1,2,3i=1,2,3.

2.1 Vectors

In this chapter, we are going to consider the role of vectors in special relativity. We can think of a vector in special relativity as an arrow in spacetime. If we have two events at points A A A\mathcal{A}A and B B B\mathcal{B}B in flat spacetime, then we can define a vector 4 4 ^(4){ }^{4}4 that points from A A A\mathcal{A}A to B B B\mathcal{B}B by
(2.2) X = B A . (2.2) X = B A . {:(2.2)X=B-A.:}\begin{equation*} \boldsymbol{X}=\mathcal{B}-\mathcal{A} . \tag{2.2} \end{equation*}(2.2)X=BA.
Defined in this way, a vector lives independently of any coordinate system. The vector points from the event at point A A A\mathcal{A}A to an event at B B B\mathcal{B}B, no matter what time and space coordinates we assign to the events (see Fig. 2.2). In order to express the vector in terms of coordinates, we need to define a set of basis vectors which we shall denote 5 5 ^(5){ }^{5}5 by e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ.
Example 2.2
Old-fashioned 3-vectors in Euclidean three-dimensional space are written as
(2.3) a = a x e x + a y e y + a z e z (2.3) a = a x e x + a y e y + a z e z {:(2.3) vec(a)=a^(x) vec(e)_(x)+a^(y) vec(e)_(y)+a^(z) vec(e)_(z):}\begin{equation*} \vec{a}=a^{x} \vec{e}_{x}+a^{y} \vec{e}_{y}+a^{z} \vec{e}_{z} \tag{2.3} \end{equation*}(2.3)a=axex+ayey+azez
Note that components are given upstairs indices, while basis vectors are given downstairs indices. As a result, the scalar product (or dot product) is written as
(2.4) a b = a x b x + a y b y + a z b z (2.4) a b = a x b x + a y b y + a z b z {:(2.4) vec(a)* vec(b)=a^(x)b^(x)+a^(y)b^(y)+a^(z)b^(z):}\begin{equation*} \vec{a} \cdot \vec{b}=a^{x} b^{x}+a^{y} b^{y}+a^{z} b^{z} \tag{2.4} \end{equation*}(2.4)ab=axbx+ayby+azbz
In a Cartesian coordinate system, the basis vectors are orthonormal, expressed as 6 6 ^(6){ }^{6}6
(2.5) e i e j = δ i j (2.5) e i e j = δ i j {:(2.5) vec(e)_(i)* vec(e)_(j)=delta_(ij):}\begin{equation*} \vec{e}_{i} \cdot \vec{e}_{j}=\delta_{i j} \tag{2.5} \end{equation*}(2.5)eiej=δij
A useful trick to note is that components of a vector can be projected out, that is, they can be extracted using
(2.6) a e x = a x (2.6) a e x = a x {:(2.6) vec(a)* vec(e)_(x)=a^(x):}\begin{equation*} \vec{a} \cdot \vec{e}_{x}=a^{x} \tag{2.6} \end{equation*}(2.6)aex=ax
By analogy with Example 2.2, a 4 -vector in spacetime is written as
(2.7) X = X 0 e 0 + X 1 e 1 + X 2 e 2 + X 3 e 3 = X μ e μ (2.7) X = X 0 e 0 + X 1 e 1 + X 2 e 2 + X 3 e 3 = X μ e μ {:(2.7)X=X^(0)e_(0)+X^(1)e_(1)+X^(2)e_(2)+X^(3)e_(3)=X^(mu)e_(mu):}\begin{equation*} \boldsymbol{X}=X^{0} \boldsymbol{e}_{0}+X^{1} \boldsymbol{e}_{1}+X^{2} \boldsymbol{e}_{2}+X^{3} \boldsymbol{e}_{3}=X^{\mu} \boldsymbol{e}_{\mu} \tag{2.7} \end{equation*}(2.7)X=X0e0+X1e1+X2e2+X3e3=Xμeμ
where in the last equality we have used the Einstein summation convention, by which index variables (like μ μ mu\muμ ) repeated in both the upstairs and downstairs positions are assumed to be summed.
3 A 3 A ^(3)A{ }^{3} \mathrm{~A}3 A good counterexample is the twocomponent 'shopping vector' that contains the price of fish and the price of bread in each component. If you approach the supermarket checkout with the trolley at 45 45 45^(@)45^{\circ}45 to the vertical, you will soon discover that the prices of your shopping will not transform appropriately. To use the jargon introduced earlier, vectors have to transform covariantly (see the discussion on per 3), and our 'shopping vector' fals page 3), and our shopping vector' fails. t isn't a vector at all, just a couple numbers surrounded by brackets.
4 4 ^(4){ }^{4}4 This makes it look like vectors and intervals are very similar, and so they are, at this stage. We'll see, however, that they lose this similarity when we start to look at curved spacetimes.
Fig. 2.2 A vector X X XXX lives free of any coordinate system. We can, however, impose a coordinate system and express a vector in terms of basis vectors e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ and its components X μ X μ X^(mu)X^{\mu}Xμ.
5 5 ^(5){ }^{5}5 The μ μ mu\muμ in e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ tells us which basis vector we're dealing with, rather than telling us which component of a vector we're talking about.
6 6 ^(6){ }^{6}6 We define the symbol δ i j δ i j delta_(ij)\delta_{i j}δij such that δ i j = 1 δ i j = 1 delta_(ij)=1\delta_{i j}=1δij=1 when i = j i = j i=ji=ji=j and δ i j = 0 δ i j = 0 delta_(ij)=0\delta_{i j}=0δij=0 otherwise. It is known as the Kronecker delta.
7 7 ^(7){ }^{7}7 Although the prime is written on the subscript, the components X σ X σ X^(sigma^('))X^{\sigma^{\prime}}Xσ and basis vectors e σ e σ e_(sigma^('))\boldsymbol{e}_{\sigma^{\prime}}eσ refer to a different cosis vectors e σ e σ e_(sigma^('))\boldsymbol{e}_{\sigma^{\prime}}eσ refer to a different co-
ordinate system (the primed coordiordinate system (the primed coordi-
nate system) from that of X μ X μ X^(mu)X^{\mu}Xμ and e μ e μ e_(mu)e_{\mu}eμ. nate system) from that of X μ X μ X^(mu)X^{\mu}Xμ and e μ e μ e_(mu)e_{\mu}eμ.
Putting the prime on the indices, rather Putting the prime on the indices, rather than on the variables themselves, might
seem like an odd choice, but it will turn seem like an odd choice, but it will turn
out to be very useful when we start dealing with more complicated equations.
Fig. 2.3 The unprimed and primed coordinate system, showing just one spatial direction.
8 8 ^(8){ }^{8}8 This equation can be written as
X μ = ν Λ ν μ X ν X μ = ν Λ ν μ X ν X^(mu^('))=sum_(nu)Lambda_(nu)^(mu^('))X^(nu)X^{\mu^{\prime}}=\sum_{\nu} \Lambda_{\nu}^{\mu^{\prime}} X^{\nu}Xμ=νΛνμXν
Note that all the coordinate transformations we are considering are ones that preserve the origin, so that an event at x = 0 x = 0 vec(x)=0\vec{x}=0x=0 and t = 0 t = 0 t=0t=0t=0 in frame S S SSS is mapped into x = 0 x = 0 vec(x)^(')=0\vec{x}^{\prime}=0x=0 and t = 0 t = 0 t^(')=0t^{\prime}=0t=0 in frame S S S^(')S^{\prime}S.
9 9 ^(9){ }^{9}9 This is an example of the famous relation for differentials
d f = f x d x + f y d y + f z d z d f = f x d x + f y d y + f z d z df=(del f)/(del x)dx+(del f)/(del y)dy+(del f)/(del z)dz\mathrm{d} f=\frac{\partial f}{\partial x} \mathrm{~d} x+\frac{\partial f}{\partial y} \mathrm{~d} y+\frac{\partial f}{\partial z} \mathrm{~d} zdf=fx dx+fy dy+fz dz
for a function f ( x , y , z ) f ( x , y , z ) f(x,y,z)f(x, y, z)f(x,y,z).
10 10 ^(10){ }^{10}10 In this chapter, the only transformation we shall consider is the Lorentz transformation. It will turn out that this rule applies more generally to components of vectors although, as discussed in the next chapter, only to the position vector in special cases.

2.2 Coordinate transformations

Since vectors X X X\boldsymbol{X}X exist independently of bases and coordinates, they can be expressed in different coordinate systems (see Fig. 2.3) via a different set of basis vectors 7 7 ^(7){ }^{7}7
(2.8) X = X μ e μ = X σ e σ (2.8) X = X μ e μ = X σ e σ {:(2.8)X=X^(mu)e_(mu)=X^(sigma^('))e_(sigma^(')):}\begin{equation*} \boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}=X^{\sigma^{\prime}} \boldsymbol{e}_{\sigma^{\prime}} \tag{2.8} \end{equation*}(2.8)X=Xμeμ=Xσeσ
Special relativity is based on the observation that the components of 4 -vectors transform between inertial frames according to the Lorentz transformations
(2.9) X μ = Λ ν μ X ν (2.9) X μ = Λ ν μ X ν {:(2.9)X^(mu^('))=Lambda_(nu)^(mu^('))X^(nu):}\begin{equation*} X^{\mu^{\prime}}=\Lambda_{\nu}^{\mu^{\prime}} X^{\nu} \tag{2.9} \end{equation*}(2.9)Xμ=ΛνμXν
where we represent the Lorentz transformations in component form using Λ ν μ Λ ν μ Lambda_(nu)^(mu^('))\Lambda_{\nu}^{\mu^{\prime}}Λνμ, which are functions of the relative velocity v ( = β ) v ( = β ) v(=beta)v(=\beta)v(=β) of the frames.

Example 2.3

The Lorentz transformation for the coordinates of an event in a frame S S SSS and a frame S S S^(')S^{\prime}S (moving relative to frame S S SSS at speed β β beta\betaβ along the x x xxx-axis) can be rewritten in matrix form as
(2.10) ( X 0 X 1 X 2 X 3 ) = ( γ β γ 0 0 β γ γ 0 0 0 0 1 0 0 0 0 1 ) ( X 0 X 1 X 2 X 3 ) , (2.10) X 0 X 1 X 2 X 3 = γ β γ 0 0 β γ γ 0 0 0 0 1 0 0 0 0 1 X 0 X 1 X 2 X 3 , {:(2.10)([X^(0^('))],[X^(1^('))],[X^(2^('))],[X^(3^('))])=([gamma,-beta gamma,0,0],[-beta gamma,gamma,0,0],[0,0,1,0],[0,0,0,1])([X^(0)],[X^(1)],[X^(2)],[X^(3)])",":}\left(\begin{array}{l} X^{0^{\prime}} \tag{2.10}\\ X^{1^{\prime}} \\ X^{2^{\prime}} \\ X^{3^{\prime}} \end{array}\right)=\left(\begin{array}{cccc} \gamma & -\beta \gamma & 0 & 0 \\ -\beta \gamma & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)\left(\begin{array}{l} X^{0} \\ X^{1} \\ X^{2} \\ X^{3} \end{array}\right),(2.10)(X0X1X2X3)=(γβγ00βγγ0000100001)(X0X1X2X3),
or for short as
(2.11) X μ = Λ ν μ X ν (2.11) X μ = Λ ν μ X ν {:(2.11)X^(mu^('))=Lambda_(nu)^(mu^('))X^(nu):}\begin{equation*} X^{\mu^{\prime}}=\Lambda_{\nu}^{\mu^{\prime}} X^{\nu} \tag{2.11} \end{equation*}(2.11)Xμ=ΛνμXν
where Λ μ ν Λ μ ν Lambda^(mu^('))_(nu)\Lambda^{\mu^{\prime}}{ }_{\nu}Λμν is the Lorentz transformation matrix. Here we have again used the Einstein summation convention, and the twice-repeated index which is assumed to be summed is ν ν nu\nuν. 8 8 ^(8){ }^{8}8
An example of a vector is the infinitesimal translation d x d x dx\mathrm{d} \boldsymbol{x}dx which has components d x ν d x ν dx^(nu)\mathrm{d} x^{\nu}dxν in frame S S SSS. In frame S S S^(')S^{\prime}S, the components then change to d x μ = Λ μ d x ν d x μ = Λ μ d x ν dx^(mu^('))=Lambda^(mu^('))dx^(nu)\mathrm{d} x^{\mu^{\prime}}=\Lambda^{\mu^{\prime}} \mathrm{d} x^{\nu}dxμ=Λμdxν. Noting how each component resembles a differential of a function x μ x μ x^(mu^('))x^{\mu^{\prime}}xμ, we recall that the ordinary rules of calculus also give us a rule for manipulating differentials that reads 9 9 ^(9){ }^{9}9
(2.12) d x μ = x μ x ν d x ν (2.12) d x μ = x μ x ν d x ν {:(2.12)dx^(mu^('))=(delx^(mu^(')))/(delx^(nu))dx^(nu):}\begin{equation*} \mathrm{d} x^{\mu^{\prime}}=\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}} \mathrm{d} x^{\nu} \tag{2.12} \end{equation*}(2.12)dxμ=xμxνdxν
Thus, we conclude that
(2.13) Λ ν μ = x μ x ν (2.13) Λ ν μ = x μ x ν {:(2.13)Lambda_(nu)^(mu^('))=(delx^(mu^(')))/(delx^(nu)):}\begin{equation*} \Lambda_{\nu}^{\mu^{\prime}}=\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}} \tag{2.13} \end{equation*}(2.13)Λνμ=xμxν
Thus, transforming components from an unprimed to a primed frame uses this partial derivative which varies a coordinate in the primed frame with respect to a coordinate in the unprimed frame, keeping other coordinates in the unprimed frame fixed. We say that the components of vectors transform like differentials. 10 10 ^(10){ }^{10}10
A key property of the Lorentz transformation is that it preserves the length of a vector, which is a quantity obtained by taking the scalar
product of a vector with itself. The scalar product is a rule for combining vectors that we write
X Y = ( X μ e μ ) ( Y ν e ν ) (2.14) = ( e μ e ν ) X μ Y ν X Y = X μ e μ Y ν e ν (2.14) = e μ e ν X μ Y ν {:[X*Y=(X^(mu)e_(mu))*(Y^(nu)e_(nu))],[(2.14)=(e_(mu)*e_(nu))X^(mu)Y^(nu)]:}\begin{align*} \boldsymbol{X} \cdot \boldsymbol{Y} & =\left(X^{\mu} \boldsymbol{e}_{\mu}\right) \cdot\left(Y^{\nu} \boldsymbol{e}_{\nu}\right) \\ & =\left(\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}\right) X^{\mu} Y^{\nu} \tag{2.14} \end{align*}XY=(Xμeμ)(Yνeν)(2.14)=(eμeν)XμYν
The object ( e μ e ν ) e μ e ν (e_(mu)*e_(nu))\left(\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}\right)(eμeν) is a matrix giving a rule for combining vectors. In flat space, this matrix is defined to be η μ ν e μ e ν η μ ν e μ e ν eta_(mu nu)-=e_(mu)*e_(nu)\eta_{\mu \nu} \equiv \boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}ημνeμeν and written out in full as
(2.15) η μ ν = ( 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ) (2.15) η μ ν = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 {:(2.15)eta_(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]):}\eta_{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \tag{2.15}\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)(2.15)ημν=(1000010000100001)
This matrix is called the Minkowski metric tensor or Minkowski metric, for short. 11 11 ^(11){ }^{11}11 The rule for combining two 4 -vectors is, therefore,
X Y = η μ ν X μ Y ν (2.16) = X 0 Y 0 + X 1 Y 1 + X 2 Y 2 + X 3 Y 3 X Y = η μ ν X μ Y ν (2.16) = X 0 Y 0 + X 1 Y 1 + X 2 Y 2 + X 3 Y 3 {:[X*Y=eta_(mu nu)X^(mu)Y^(nu)],[(2.16)=-X^(0)Y^(0)+X^(1)Y^(1)+X^(2)Y^(2)+X^(3)Y^(3)]:}\begin{align*} \boldsymbol{X} \cdot \boldsymbol{Y} & =\eta_{\mu \nu} X^{\mu} Y^{\nu} \\ & =-X^{0} Y^{0}+X^{1} Y^{1}+X^{2} Y^{2}+X^{3} Y^{3} \tag{2.16} \end{align*}XY=ημνXμYν(2.16)=X0Y0+X1Y1+X2Y2+X3Y3
The minus sign in η 00 η 00 eta_(00)\eta_{00}η00 is chosen to fit with our definition of d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2, so that we have d s 2 = d x d x d s 2 = d x d x ds^(2)=dx*dx\mathrm{d} s^{2}=\mathrm{d} \boldsymbol{x} \cdot \mathrm{d} \boldsymbol{x}ds2=dxdx which, when written in terms of components, becomes d s 2 = η μ ν d x μ d x ν d s 2 = η μ ν d x μ d x ν ds^(2)=eta_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ds2=ημνdxμdxν.
We can summarize the key expressions involving the Minkowski tensor as follows:
(2.17) η μ ν = e μ e ν , (2.18) X Y = η μ ν X μ Y ν , (2.19) d s 2 = η μ ν d x μ d x ν . (2.17) η μ ν = e μ e ν , (2.18) X Y = η μ ν X μ Y ν , (2.19) d s 2 = η μ ν d x μ d x ν . {:[(2.17)eta_(mu nu)=e_(mu)*e_(nu)","],[(2.18)X*Y=eta_(mu nu)X^(mu)Y^(nu)","],[(2.19)ds^(2)=eta_(mu nu)dx^(mu)dx^(nu).]:}\begin{align*} \eta_{\mu \nu} & =\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}, \tag{2.17}\\ \boldsymbol{X} \cdot \boldsymbol{Y} & =\eta_{\mu \nu} X^{\mu} Y^{\nu}, \tag{2.18}\\ \mathrm{d} s^{2} & =\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu} . \tag{2.19} \end{align*}(2.17)ημν=eμeν,(2.18)XY=ημνXμYν,(2.19)ds2=ημνdxμdxν.
Just as an interval ds can be timelike, spacelike or null, we can classify a vector X X X\boldsymbol{X}X in terms of its square X 2 = X X X 2 = X X X^(2)=X*X\boldsymbol{X}^{2}=\boldsymbol{X} \cdot \boldsymbol{X}X2=XX by saying
(2.20) Spacelike vector X 2 > 0 , Null vector X 2 = 0 , Timelike vector X 2 < 0 (2.20)  Spacelike vector  X 2 > 0 ,  Null vector  X 2 = 0 ,  Timelike vector  X 2 < 0 {:(2.20){:[" Spacelike vector ",X^(2) > 0","],[" Null vector ",X^(2)=0","],[" Timelike vector ",X^(2) < 0]:}:}\begin{array}{ll} \text { Spacelike vector } & \boldsymbol{X}^{2}>0, \\ \text { Null vector } & \boldsymbol{X}^{2}=0, \tag{2.20}\\ \text { Timelike vector } & \boldsymbol{X}^{2}<0 \end{array}(2.20) Spacelike vector X2>0, Null vector X2=0, Timelike vector X2<0
Consider a light cone (see Fig. 2.4) based at a point P P P\mathcal{P}P. Timelike vectors starting from P P P\mathcal{P}P can only exist within the forward or backward light cones. Spacelike vectors exist outside of the light cones while null vectors lie on the light cones. Light cones are sometimes called absolute surfaces as they always allow us to separate intervals and vectors in this way.
Example 2.4
The Lorentz transformation preserves the length of a vector, which is therefore a Lorentz invariant. This means that we can write
(2.21) X X = η μ ν X μ X ν = η μ ν X μ X ν (2.21) X X = η μ ν X μ X ν = η μ ν X μ X ν {:(2.21)X*X=eta_(mu nu)X^(mu)X^(nu)=eta_(mu^(')nu^('))X^(mu^('))X^(nu^(')):}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{X}=\eta_{\mu \nu} X^{\mu} X^{\nu}=\eta_{\mu^{\prime} \nu^{\prime}} X^{\mu^{\prime}} X^{\nu^{\prime}} \tag{2.21} \end{equation*}(2.21)XX=ημνXμXν=ημνXμXν
11 11 ^(11){ }^{11}11 Some jargon: The signature of the metric tensor is the number of positive, negative or zero eigenvalues. Here, our metric is diagonal and so the eigenvalues can be read off very simply. The Minkowski metric tensor in eqn 2.15 has eigenvalues -1 , 1,1 and 1 and so its signature can , 1 and so its signature can be written as ( 3 , 1 , 0 3 , 1 , 0 3,1,03,1,03,1,0 ) ( 3 plusses, 1
minus, no zeros) or, more commonly, minus, no zeros) or, more commonly,
as ( , + , + , + ) ( , + , + , + ) (-,+,+,+)(-,+,+,+)(,+,+,+) (enumerating the signs as ( , + , + , + ) ( , + , + , + ) (-,+,+,+)(-,+,+,+)(,+,+,+) (enumerating the signs
of the eigenvalues). Positive definite metrics ( + , + , + , + ) ( + , + , + , + ) (+,+,+,+)(+,+,+,+)(+,+,+,+) are called Riemannian. A Lorentzian metric has a signature of one minus and the rest plusses, such as ( , + , + , + ) ( , + , + , + ) (-,+,+,+)(-,+,+,+)(,+,+,+) as in eqn 2.15 , or one plus and the rest minuses, as in ( + , , , ) ( + , , , ) (+,-,-,-)(+,-,-,-)(+,,,). These latter metrics are known as pseudo Riemannian.
Fig. 2.4 The anatomy of a light cone, showing timelike, spacelike and null vectors.
However, this equation implies that
(2.22) η μ ν X μ X ν = η μ ν ( Λ μ μ X μ ) ( Λ ν ν X ν ) = η μ ν X μ X ν (2.22) η μ ν X μ X ν = η μ ν Λ μ μ X μ Λ ν ν X ν = η μ ν X μ X ν {:(2.22)eta_(mu^(')nu^('))X^(mu^('))X^(nu^('))=eta_(mu^(')nu^('))(Lambda_(mu)^(mu^('))X^(mu))(Lambda_(nu)^(nu^('))X^(nu))=eta_(mu nu)X^(mu)X^(nu):}\begin{equation*} \eta_{\mu^{\prime} \nu^{\prime}} X^{\mu^{\prime}} X^{\nu^{\prime}}=\eta_{\mu^{\prime} \nu^{\prime}}\left(\Lambda_{\mu}^{\mu^{\prime}} X^{\mu}\right)\left(\Lambda_{\nu}^{\nu^{\prime}} X^{\nu}\right)=\eta_{\mu \nu} X^{\mu} X^{\nu} \tag{2.22} \end{equation*}(2.22)ημνXμXν=ημν(ΛμμXμ)(ΛννXν)=ημνXμXν
which is true for any choice of X X X\boldsymbol{X}X and hence
(2.23) η μ ν Λ μ μ Λ ν ν = η μ ν (2.23) η μ ν Λ μ μ Λ ν ν = η μ ν {:(2.23)eta_(mu^(')nu^('))Lambda_(mu)^(mu^('))Lambda_(nu)^(nu^('))=eta_(mu nu):}\begin{equation*} \eta_{\mu^{\prime} \nu^{\prime}} \Lambda_{\mu}^{\mu^{\prime}} \Lambda_{\nu}^{\nu^{\prime}}=\eta_{\mu \nu} \tag{2.23} \end{equation*}(2.23)ημνΛμμΛνν=ημν
This identity can be used to then show that for two different vectors that
(2.24) X Y = η μ ν X μ Y ν = η μ ν X μ Y ν (2.24) X Y = η μ ν X μ Y ν = η μ ν X μ Y ν {:(2.24)X*Y=eta_(mu nu)X^(mu)Y^(nu)=eta_(mu^(')nu^('))X^(mu^('))Y^(nu^(')):}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu}=\eta_{\mu^{\prime} \nu^{\prime}} X^{\mu^{\prime}} Y^{\nu^{\prime}} \tag{2.24} \end{equation*}(2.24)XY=ημνXμYν=ημνXμYν
and hence the scalar product X Y X Y X*Y\boldsymbol{X} \cdot \boldsymbol{Y}XY is Lorentz invariant. This will be very useful.

Example 2.5

Vectors exist independently of any coordinate system. Therefore, the object X = X = X=\boldsymbol{X}=X= X μ e μ X μ e μ X^(mu)e_(mu)X^{\mu} e_{\mu}Xμeμ doesn't change as it is transformed. This allows us to work out how the basis vectors e μ e μ e_(mu)e_{\mu}eμ themselves transform. We write the transformation
(2.25) e α = Λ α μ e μ (2.25) e α = Λ α μ e μ {:(2.25)e_(alpha^('))=Lambda_(alpha^('))^(mu)e_(mu):}\begin{equation*} \boldsymbol{e}_{\alpha^{\prime}}=\Lambda_{\alpha^{\prime}}^{\mu} \boldsymbol{e}_{\mu} \tag{2.25} \end{equation*}(2.25)eα=Λαμeμ
by analogy with eqn 2.9. In order for the vector X X X\boldsymbol{X}X itself to be coordinate independent, the product of the transformations of the components X μ X μ X^(mu)X^{\mu}Xμ and basis vectors e μ e μ e_(mu)e_{\mu}eμ must yield the identity. That is to say that X X X\boldsymbol{X}X can be written equivalently as
(2.26) X = X μ e μ = X α e α (2.26) X = X μ e μ = X α e α {:(2.26)X=X^(mu)e_(mu)=X^(alpha^('))e_(alpha^(')):}\begin{equation*} \boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}=X^{\alpha^{\prime}} \boldsymbol{e}_{\alpha^{\prime}} \tag{2.26} \end{equation*}(2.26)X=Xμeμ=Xαeα
12 12 ^(12){ }^{12}12 Note that, in these expressions full of components, we are free to reorder the terms for convenience.
13 13 ^(13){ }^{13}13 The symbol δ ν μ δ ν μ delta^(nu)_(mu)\delta^{\nu}{ }_{\mu}δνμ is another version of the Kronecker delta which is defined by
(2.28) δ ν μ = { 1 μ = ν 0 μ ν (2.28) δ ν μ = 1 μ = ν 0 μ ν {:(2.28)delta^(nu)_(mu)={[1,mu=nu],[0,mu!=nu]:}:}\delta^{\nu}{ }_{\mu}= \begin{cases}1 & \mu=\nu \tag{2.28}\\ 0 & \mu \neq \nu\end{cases}(2.28)δνμ={1μ=ν0μν
This particular form of the Kronecker delta, with one index up and the other down, is needed for reasons that will be explored after we have considered tensors in Chapter 4 (see in particular Exercise 4.2).
14 14 ^(14){ }^{14}14 You can check this is true using the matrix in eqn 2.10, noting that the sign of the velocity β = v β = v beta=v\beta=vβ=v is flipped in the inverse operation which transforms back from the primed frame to the unprimed frame.
and application of eqns 2.9 and 2.25 implies that 12 12 ^(12){ }^{12}12
(2.27) X = X α e α = ( Λ μ α X μ ) ( Λ α ν e ν ) = ( Λ μ α Λ ν α ) X μ e ν = X μ e μ (2.27) X = X α e α = Λ μ α X μ Λ α ν e ν = Λ μ α Λ ν α X μ e ν = X μ e μ {:(2.27)X=X^(alpha^('))e_(alpha^('))=(Lambda_(mu)^(alpha^('))X^(mu))(Lambda_(alpha^('))^(nu)e_(nu))=(Lambda_(mu)^(alpha^('))Lambda^(nu)_(alpha^(')))X^(mu)e_(nu)=X^(mu)e_(mu):}\begin{equation*} \boldsymbol{X}=X^{\alpha^{\prime}} \boldsymbol{e}_{\alpha^{\prime}}=\left(\Lambda_{\mu}^{\alpha^{\prime}} X^{\mu}\right)\left(\Lambda_{\alpha^{\prime}}^{\nu} \boldsymbol{e}_{\nu}\right)=\left(\Lambda_{\mu}^{\alpha^{\prime}} \Lambda^{\nu}{ }_{\alpha^{\prime}}\right) X^{\mu} \boldsymbol{e}_{\nu}=X^{\mu} \boldsymbol{e}_{\mu} \tag{2.27} \end{equation*}(2.27)X=Xαeα=(ΛμαXμ)(Λανeν)=(ΛμαΛνα)Xμeν=Xμeμ
where the last equality relies on Λ α μ Λ ν α = δ ν μ Λ α μ Λ ν α = δ ν μ Lambda^(alpha^('))_(mu)Lambda^(nu)_(alpha^('))=delta^(nu)_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu} \Lambda^{\nu}{ }_{\alpha^{\prime}}=\delta^{\nu}{ }_{\mu}ΛαμΛνα=δνμ, the identity operation. 13 13 ^(13){ }^{13}13 Thus, we identify the inverse of the Lorentz transformation Λ α μ Λ α μ Lambda^(alpha^('))_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu}Λαμ which we write
( Λ μ α ) 1 = Λ α μ Λ μ α 1 = Λ α μ (Lambda_(mu)^(alpha^(')))^(-1)=Lambda_(alpha^('))^(mu)\left(\Lambda_{\mu}^{\alpha^{\prime}}\right)^{-1}=\Lambda_{\alpha^{\prime}}^{\mu}(Λμα)1=Λαμ
To summarize: the inverse of the matrix Λ α μ Λ α μ Lambda^(alpha^('))_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu}Λαμ is the matrix 14 Λ μ α 14 Λ μ α ^(14)Lambda^(mu)_(alpha^(')){ }^{14} \Lambda^{\mu}{ }_{\alpha^{\prime}}14Λμα. We deduce that the basis vectors transform using the inverse of the Lorentz transformation used for the vector components. The key equations for identifying inverses are
(2.29) Λ α Λ γ β = δ γ α and Λ α β Λ γ α = δ β (2.29) Λ α Λ γ β = δ γ α  and  Λ α β Λ γ α = δ β {:(2.29)Lambda^(alpha^('))Lambda_(gamma^('))^(beta)=delta_(gamma^('))^(alpha^('))quad" and "quadLambda_(alpha^('))^(beta)Lambda_(gamma)^(alpha^('))=delta^(beta):}\begin{equation*} \Lambda^{\alpha^{\prime}} \Lambda_{\gamma^{\prime}}^{\beta}=\delta_{\gamma^{\prime}}^{\alpha^{\prime}} \quad \text { and } \quad \Lambda_{\alpha^{\prime}}^{\beta} \Lambda_{\gamma}^{\alpha^{\prime}}=\delta^{\beta} \tag{2.29} \end{equation*}(2.29)ΛαΛγβ=δγα and ΛαβΛγα=δβ

Example 2.6

We saw earlier that the components of vectors, carrying an up index, transform like differentials. The rule for transforming objects with down indices, such as the basis vectors, is that they transform like derivatives
(2.30) y α = ( x μ y α ) x μ (2.30) y α = x μ y α x μ {:(2.30)(del)/(dely^(alpha^(')))=((delx^(mu))/(dely^(alpha^('))))(del)/(delx^(mu)):}\begin{equation*} \frac{\partial}{\partial y^{\alpha^{\prime}}}=\left(\frac{\partial x^{\mu}}{\partial y^{\alpha^{\prime}}}\right) \frac{\partial}{\partial x^{\mu}} \tag{2.30} \end{equation*}(2.30)yα=(xμyα)xμ
where the derivative involves varying a coordinate in the unprimed frame with respect to a coordinate in the primed frame, keeping other coordinates in the primed frame fixed. Thus, another down-indexed object, the gradient vector μ ϕ ϕ / x μ μ ϕ ϕ / x μ del_(mu)phi-=del phi//delx^(mu)\partial_{\mu} \phi \equiv \partial \phi / \partial x^{\mu}μϕϕ/xμ, transforms as
(2.31) ϕ x μ = ( x ν x μ ) ϕ x ν (2.31) ϕ x μ = x ν x μ ϕ x ν {:(2.31)(del phi)/(delx^(mu^(')))=((delx^(nu))/(delx^(mu^('))))(del phi)/(delx^(nu)):}\begin{equation*} \frac{\partial \phi}{\partial x^{\mu^{\prime}}}=\left(\frac{\partial x^{\nu}}{\partial x^{\mu^{\prime}}}\right) \frac{\partial \phi}{\partial x^{\nu}} \tag{2.31} \end{equation*}(2.31)ϕxμ=(xνxμ)ϕxν
The jargon is that a μ a μ a^(mu)a^{\mu}aμ transforms like a contravariant vector and ϕ / x μ μ ϕ ϕ / x μ μ ϕ del phi//delx^(mu)-=del_(mu)phi\partial \phi / \partial x^{\mu} \equiv \partial_{\mu} \phiϕ/xμμϕ transforms like a covariant vector, 15 15 ^(15){ }^{15}15 though we avoid these terms and just note that the component a μ a μ a^(mu)a^{\mu}aμ has its indices 'upstairs' and μ ϕ μ ϕ del_(mu)phi\partial_{\mu} \phiμϕ has them 'downstairs' and they then transform accordingly. We can also construct the 4 -vector analogue of 2 2 vec(grad)^(2)\vec{\nabla}^{2}2 using 16 2 = η μ ν μ ν 16 2 = η μ ν μ ν ^(16)del^(2)=eta^(mu nu)del_(mu)del_(nu)^{16} \partial^{2}=\eta^{\mu \nu} \partial_{\mu} \partial_{\nu}162=ημνμν, and this will turn out to be very useful in the theory of gravitational waves.
In summary, we will insist that an object with indices in a downstairs position like
a μ a μ a_(mu)a_{\mu}aμ transforms as
(2.32) a μ = Λ μ ν a ν (2.32) a μ = Λ μ ν a ν {:(2.32)a_(mu^('))=Lambda_(mu^('))^(nu)a_(nu):}\begin{equation*} a_{\mu^{\prime}}=\Lambda_{\mu^{\prime}}^{\nu} a_{\nu} \tag{2.32} \end{equation*}(2.32)aμ=Λμνaν
where Λ ν μ ( x ν / x μ ) Λ ν μ x ν / x μ Lambda^(nu)_(mu^('))-=(delx^(nu)//delx^(mu^(')))\Lambda^{\nu}{ }_{\mu^{\prime}} \equiv\left(\partial x^{\nu} / \partial x^{\mu^{\prime}}\right)Λνμ(xν/xμ) is the inverse of the Lorentz transformation matrix Λ μ ν Λ μ ν Lambda^(mu^('))_(nu)\Lambda^{\mu^{\prime}}{ }_{\nu}Λμν.
It is a good moment to summarize some of our key results so far:
  • A vector X X X\boldsymbol{X}X is an arrow in space. It can be written in components as X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ. The components transform according to X μ = X μ = X^(mu^('))=X^{\mu^{\prime}}=Xμ= Λ μ ν ν Λ μ ν ν Lambda^(mu^('))_(nu)^(nu)\Lambda^{\mu^{\prime}}{ }_{\nu}^{\nu}Λμνν, in the same way as a differential d x μ d x μ dx^(mu)\mathrm{d} x^{\mu}dxμ.
  • The basis vectors have downstairs components and transform according to e μ = Λ ν μ e ν e μ = Λ ν μ e ν e_(mu^('))=Lambda^(nu)_(mu^('))e_(nu)\boldsymbol{e}_{\mu^{\prime}}=\Lambda^{\nu}{ }_{\mu^{\prime}} \boldsymbol{e}_{\nu}eμ=Λνμeν (i.e. the inverse transformation), in the same way as a gradient μ = / x μ μ = / x μ del_(mu)=del//delx^(mu)\partial_{\mu}=\partial / \partial x^{\mu}μ=/xμ.
  • The scalar product X Y = η μ ν X μ Y ν X Y = η μ ν X μ Y ν X*Y=eta_(mu nu)X^(mu)Y^(nu)\boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu}XY=ημνXμYν is Lorentz invariant.
    15 15 ^(15){ }^{15}15 These unfortunate terms are due to the English mathematician J. J. Sylvester (1814-1897). Both types of vectors transform covariantly, in the sense of 'properly', and we wish to retain this sense of the word 'covariant' rather than using it to simply label one type of object that transforms properly. Thus, we usually specify whether the indices on a particular object are 'upstairs' (like a μ a μ a^(mu)a^{\mu}aμ ) or 'downstairs' (like μ ϕ ) μ ϕ {:del_(mu)phi)\left.\partial_{\mu} \phi\right)μϕ) and their transformation properties can then be deduced accordingly.
    16 16 ^(16){ }^{16}16 We define 2 2 del^(2)\partial^{2}2 as the scalar product η μ ν μ ν η μ ν μ ν eta^(mu nu)del_(mu)del_(nu)\eta^{\mu \nu} \partial_{\mu} \partial_{\nu}ημνμν so that
2 = 2 t 2 + 2 x 2 + 2 y 2 + 2 z 2 = 2 t 2 + 2 2 = 2 t 2 + 2 x 2 + 2 y 2 + 2 z 2 = 2 t 2 + 2 {:[del^(2)=-(del^(2))/(delt^(2))+(del^(2))/(delx^(2))+(del^(2))/(dely^(2))+(del^(2))/(delz^(2))],[=-(del^(2))/(delt^(2))+ vec(grad)^(2)]:}\begin{aligned} \partial^{2} & =-\frac{\partial^{2}}{\partial t^{2}}+\frac{\partial^{2}}{\partial x^{2}}+\frac{\partial^{2}}{\partial y^{2}}+\frac{\partial^{2}}{\partial z^{2}} \\ & =-\frac{\partial^{2}}{\partial t^{2}}+\vec{\nabla}^{2} \end{aligned}2=2t2+2x2+2y2+2z2=2t2+2
We will show in Chapter 4 (page 45) that η μ ν η μ ν eta^(mu nu)\eta^{\mu \nu}ημν behaves as the same matrix as η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν.

2.3 Examples of vectors

In ordinary Euclidean space we have vectors such as r r vec(r)\vec{r}r, the position vector, which is obviously an arrow in space, but also current density J J vec(J)\vec{J}J and acceleration a a vec(a)\vec{a}a which are also vectors but somehow live in different spaces. For spacetime vectors we have an analogous situation and some commonly used vectors are listed in Table 2.1, all of which transform appropriately under Lorentz transformations.
Physical quantity 4 -vector v v v\boldsymbol{v}v Invariant v v v v v*v\boldsymbol{v} \cdot \boldsymbol{v}vv
Interval d s d s ds\mathrm{d} \boldsymbol{s}ds d τ 2 d τ 2 -dtau^(2)-\mathrm{d} \tau^{2}dτ2
Position x x x\boldsymbol{x}x τ 2 τ 2 -tau^(2)-\tau^{2}τ2
Velocity u = d x / d τ u = d x / d τ u=dx//dtau\boldsymbol{u}=\mathrm{d} \boldsymbol{x} / \mathrm{d} \tauu=dx/dτ -1
Momentum p = m u p = m u p=mu\boldsymbol{p}=m \boldsymbol{u}p=mu m 2 m 2 -m^(2)-m^{2}m2
Force f = d p / d τ f = d p / d τ f=dp//dtau\boldsymbol{f}=\mathrm{d} \boldsymbol{p} / \mathrm{d} \tauf=dp/dτ m 2 | a | 2 m 2 | a | 2 m^(2)| vec(a)|^(2)m^{2}|\vec{a}|^{2}m2|a|2
Acceleration a = d u / d τ a = d u / d τ a=du//dtau\boldsymbol{a}=\mathrm{d} \boldsymbol{u} / \mathrm{d} \taua=du/dτ | a | 2 | a | 2 | vec(a)|^(2)|\vec{a}|^{2}|a|2
Current J = n 0 u J = n 0 u J=n_(0)u\boldsymbol{J}=n_{0} \boldsymbol{u}J=n0u n 0 2 n 0 2 -n_(0)^(2)-n_{0}^{2}n02
Physical quantity 4 -vector v Invariant v*v Interval ds -dtau^(2) Position x -tau^(2) Velocity u=dx//dtau -1 Momentum p=mu -m^(2) Force f=dp//dtau m^(2)| vec(a)|^(2) Acceleration a=du//dtau | vec(a)|^(2) Current J=n_(0)u -n_(0)^(2)| Physical quantity | 4 -vector $\boldsymbol{v}$ | Invariant $\boldsymbol{v} \cdot \boldsymbol{v}$ | | :--- | :--- | :--- | | Interval | $\mathrm{d} \boldsymbol{s}$ | $-\mathrm{d} \tau^{2}$ | | Position | $\boldsymbol{x}$ | $-\tau^{2}$ | | Velocity | $\boldsymbol{u}=\mathrm{d} \boldsymbol{x} / \mathrm{d} \tau$ | -1 | | Momentum | $\boldsymbol{p}=m \boldsymbol{u}$ | $-m^{2}$ | | Force | $\boldsymbol{f}=\mathrm{d} \boldsymbol{p} / \mathrm{d} \tau$ | $m^{2}\|\vec{a}\|^{2}$ | | Acceleration | $\boldsymbol{a}=\mathrm{d} \boldsymbol{u} / \mathrm{d} \tau$ | $\|\vec{a}\|^{2}$ | | Current | $\boldsymbol{J}=n_{0} \boldsymbol{u}$ | $-n_{0}^{2}$ |
Table 2.1 Commonly used 4 -vectors. *The position vector transforms correctly under Lorentz transformations, but does not transform correctly under general coordinate transformations (see Chapter 3).
(a) Interval and position: The first one in our list is our old friend the interval ds. We can also define a spacetime position 4 -vector x x x\boldsymbol{x}x which
The position vector x μ x μ x^(mu)x^{\mu}xμ works fine in special relativity, but is the one vector in our list in Table 2.1 which will not upgrade nicely to general relativity. We will still be able to work with infinitesimal displacements, such as d x μ d x μ dx^(mu)\mathrm{d} x^{\mu}dxμ. The other vectors in our list will still be extremely useful in general relativity.
17 17 ^(17){ }^{17}17 That is, it does transform properly under Lorentz transformations.
Velocity component trick: from the velocity vector u u u\boldsymbol{u}u with components ( u 0 , u i ) = ( d t / d τ , d x i d τ ) u 0 , u i = d t / d τ , d x i d τ (u^(0),u^(i))=(dt//dtau,dx^(i)(d)tau)\left(u^{0}, u^{i}\right)=\left(\mathrm{d} t / \mathrm{d} \tau, \mathrm{d} x^{i} \mathrm{~d} \tau\right)(u0,ui)=(dt/dτ,dxi dτ) we can extract a spatial part by saying
u i = d x i d τ = d x i d t d t d τ = v i u 0 u i = d x i d τ = d x i d t d t d τ = v i u 0 u^(i)=(dx^(i))/((d)tau)=(dx^(i))/((d)t)((d)t)/((d)tau)=v^(i)u^(0)u^{i}=\frac{\mathrm{d} x^{i}}{\mathrm{~d} \tau}=\frac{\mathrm{d} x^{i}}{\mathrm{~d} t} \frac{\mathrm{~d} t}{\mathrm{~d} \tau}=v^{i} u^{0}ui=dxi dτ=dxi dt dt dτ=viu0
18 18 ^(18){ }^{18}18 For a slow-moving object (i.e. slow compared to light speed) we have γ 1 γ 1 gamma~~1\gamma \approx 1γ1 and this reduces to
(2.35) u μ ( 1 , v ) (2.35) u μ ( 1 , v ) {:(2.35)u^(mu)~~(1"," vec(v)):}\begin{equation*} u^{\mu} \approx(1, \vec{v}) \tag{2.35} \end{equation*}(2.35)uμ(1,v)
19 19 ^(19){ }^{19}19 Remember that massive particles, which we assume we're describing here, have timelike velocity vectors with negative | u | 2 | u | 2 |u|^(2)|\boldsymbol{u}|^{2}|u|2, as we find.
Fig. 2.5 The velocities u , v / γ u , v / γ u,v//gamma\boldsymbol{u}, \boldsymbol{v} / \gammau,v/γ and v rel v rel  v_("rel ")\boldsymbol{v}_{\text {rel }}vrel  in the reference frame of observer U U UUU.
is the non-infinitesimal version of the same thing. In coordinates, we could write x μ = ( t , x ) x μ = ( t , x ) x^(mu)=(t, vec(x))x^{\mu}=(t, \vec{x})xμ=(t,x). As we have seen before, its invariant (the scalar product of itself with itself) is x x = t 2 + x 2 + y 2 + z 2 = τ 2 x x = t 2 + x 2 + y 2 + z 2 = τ 2 x*x=-t^(2)+x^(2)+y^(2)+z^(2)=-tau^(2)\boldsymbol{x} \cdot \boldsymbol{x}=-t^{2}+x^{2}+y^{2}+z^{2}=-\tau^{2}xx=t2+x2+y2+z2=τ2, where τ τ tau\tauτ is the proper time.
(b) Velocity: Next, let's try and find the velocity. Its tempting to write this as d x / d t d x / d t dx//dt\mathrm{d} \boldsymbol{x} / \mathrm{d} tdx/dt with components d x μ / d t = ( 1 , d x / d t ) d x μ / d t = ( 1 , d x / d t ) dx^(mu)//dt=(1,d vec(x)//dt)\mathrm{d} x^{\mu} / \mathrm{d} t=(1, \mathrm{~d} \vec{x} / \mathrm{d} t)dxμ/dt=(1, dx/dt) but this is not a 4 -vector because it does not transform properly. You can see this easily by taking the scalar product of it with itself
(2.33) d x d t d x d t = 1 + v 2 = 1 γ 2 (2.33) d x d t d x d t = 1 + v 2 = 1 γ 2 {:(2.33)(dx)/((d)t)*((d)x)/((d)t)=-1+v^(2)=-(1)/(gamma^(2)):}\begin{equation*} \frac{\mathrm{d} \boldsymbol{x}}{\mathrm{~d} t} \cdot \frac{\mathrm{~d} \boldsymbol{x}}{\mathrm{~d} t}=-1+v^{2}=-\frac{1}{\gamma^{2}} \tag{2.33} \end{equation*}(2.33)dx dt dx dt=1+v2=1γ2
where v = | d x / d t | v = | d x / d t | v=|d vec(x)//dt|v=|\mathrm{d} \vec{x} / \mathrm{d} t|v=|dx/dt| is the magnitude of the 3 -velocity. This clearly depends on which frame you are in and is not an invariant. The solution is to differentiate x x x\boldsymbol{x}x not with respect to time t t ttt but with respect to proper time τ τ tau\tauτ. This gives us a velocity that is Lorentz covariant 17 17 ^(17){ }^{17}17 defined by
(2.34) u = d x d τ (2.34) u = d x d τ {:(2.34)u=(dx)/((d)tau):}\begin{equation*} \boldsymbol{u}=\frac{\mathrm{d} \boldsymbol{x}}{\mathrm{~d} \tau} \tag{2.34} \end{equation*}(2.34)u=dx dτ
The velocity vector can be thought of as being the tangent to the world line of a particle. Using the equation d t / d τ = γ d t / d τ = γ dt//dtau=gamma\mathrm{d} t / \mathrm{d} \tau=\gammadt/dτ=γ from the last chapter, we can deduce that the velocity u u u\boldsymbol{u}u has components 18 18 ^(18){ }^{18}18
(2.36) u μ = ( d t d τ , d x i d τ ) = ( γ , γ d x d t ) = ( γ , γ v ) . (2.36) u μ = d t d τ , d x i d τ = γ , γ d x d t = ( γ , γ v ) . {:(2.36)u^(mu)=((dt)/((d)tau),((d)x^(i))/((d)tau))=(gamma,gamma(d( vec(x)))/((d)t))=(gamma","gamma vec(v)).:}\begin{equation*} u^{\mu}=\left(\frac{\mathrm{d} t}{\mathrm{~d} \tau}, \frac{\mathrm{~d} x^{i}}{\mathrm{~d} \tau}\right)=\left(\gamma, \gamma \frac{\mathrm{d} \vec{x}}{\mathrm{~d} t}\right)=(\gamma, \gamma \vec{v}) . \tag{2.36} \end{equation*}(2.36)uμ=(dt dτ, dxi dτ)=(γ,γdx dt)=(γ,γv).
The useful invariant is 19 19 ^(19){ }^{19}19
(2.37) u u = ( γ , γ v ) ( γ , γ v ) = γ 2 ( 1 + v 2 ) = 1 . (2.37) u u = ( γ , γ v ) ( γ , γ v ) = γ 2 1 + v 2 = 1 . {:(2.37)u*u=(gamma","gamma vec(v))*(gamma","gamma vec(v))=gamma^(2)(-1+v^(2))=-1.:}\begin{equation*} \boldsymbol{u} \cdot \boldsymbol{u}=(\gamma, \gamma \vec{v}) \cdot(\gamma, \gamma \vec{v})=\gamma^{2}\left(-1+v^{2}\right)=-1 . \tag{2.37} \end{equation*}(2.37)uu=(γ,γv)(γ,γv)=γ2(1+v2)=1.
This latter expression, confirmed in the next example, is used in computations throughout the book.

Example 2.7

Three examples to illustrate 4 -vector velocity:
(i) Define the 4 -vector velocity of an observer by u u u\boldsymbol{u}u. In the observer's rest frame, by definition, the 3-velocity is zero. The observer's time is then the proper time x 0 = τ x 0 = τ x^(0)=taux^{0}=\taux0=τ. As a result, the components of the 4 -velocity in the observer's rest frame ar u μ = ( 1 , 0 , 0 , 0 ) u μ = ( 1 , 0 , 0 , 0 ) u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0)uμ=(1,0,0,0). Since η 00 = 1 η 00 = 1 eta_(00)=-1\eta_{00}=-1η00=1, we also have u 2 = η 00 u 0 u 0 = 1 u 2 = η 00 u 0 u 0 = 1 u^(2)=eta_(00)u^(0)u^(0)=-1\boldsymbol{u}^{2}=\eta_{00} u^{0} u^{0}=-1u2=η00u0u0=1, as required.
(ii) Take any 4 -vector X X X\boldsymbol{X}X. A useful result is that the timelike component of X X X\boldsymbol{X}X in the observer's frame is then given by
(2.38) X obs 0 = X u (2.38) X obs 0 = X u {:(2.38)X_(obs)^(0)=-X*u:}\begin{equation*} X_{\mathrm{obs}}^{0}=-\boldsymbol{X} \cdot \boldsymbol{u} \tag{2.38} \end{equation*}(2.38)Xobs0=Xu
where u u u\boldsymbol{u}u describes the observer's 4 -velocity. Why is this? The great thing about 4 -vector dot products is if you work them out in one, easy frame, the result holds for all frames. So let's choose the observer's frame in which u μ = ( 1 , 0 , 0 , 0 ) u μ = ( 1 , 0 , 0 , 0 ) u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0)uμ=(1,0,0,0) and X μ = ( X obs 0 , X obs 1 , X obs 2 , X obs 3 ) X μ = X obs 0 , X obs 1 , X obs 2 , X obs 3 X^(mu)=(X_(obs)^(0),X_(obs)^(1),X_(obs)^(2),X_(obs)^(3))X^{\mu}=\left(X_{\mathrm{obs}}^{0}, X_{\mathrm{obs}}^{1}, X_{\mathrm{obs}}^{2}, X_{\mathrm{obs}}^{3}\right)Xμ=(Xobs0,Xobs1,Xobs2,Xobs3) and X u = X obs 0 X u = X obs 0 X*u=-X_(obs)^(0)\boldsymbol{X} \cdot \boldsymbol{u}=-X_{\mathrm{obs}}^{0}Xu=Xobs0 as required.
(iii) Two observers U and V have velocity 4 -vectors u u u\boldsymbol{u}u and v v v\boldsymbol{v}v. Let's move into U's frame of reference in which the velocities have components u μ = ( 1 , 0 ) u μ = ( 1 , 0 ) u^(mu)=(1,0)u^{\mu}=(1,0)uμ=(1,0) and v μ = ( γ , γ v rel ) v μ = γ , γ v rel  v^(mu)=(gamma,gamma vec(v)_("rel "))v^{\mu}=\left(\gamma, \gamma \vec{v}_{\text {rel }}\right)vμ=(γ,γvrel ), where γ = ( 1 v rel 2 ) 1 / 2 γ = 1 v rel  2 1 / 2 gamma=(1-v_("rel ")^(2))^(-1//2)\gamma=\left(1-v_{\text {rel }}^{2}\right)^{-1 / 2}γ=(1vrel 2)1/2 is appropriate for the relative 3-velocity v rel v rel  vec(v)_("rel ")\vec{v}_{\text {rel }}vrel  between the two observers. There is an elegant geometrical construction we can make by looking at v μ / γ = ( 1 , v rel ) v μ / γ = 1 , v rel  v^(mu)//gamma=(1, vec(v)_("rel "))v^{\mu} / \gamma=\left(1, \vec{v}_{\text {rel }}\right)vμ/γ=(1,vrel ) which can be written as
(2.39) v γ = u + v rel (2.39) v γ = u + v rel {:(2.39)(v)/( gamma)=u+v_(rel):}\begin{equation*} \frac{\boldsymbol{v}}{\gamma}=\boldsymbol{u}+\boldsymbol{v}_{\mathrm{rel}} \tag{2.39} \end{equation*}(2.39)vγ=u+vrel
where v rel v rel  v_("rel ")\boldsymbol{v}_{\text {rel }}vrel , which has components ( 0 , v rel ) 0 , v rel  (0, vec(v)_("rel "))\left(0, \vec{v}_{\text {rel }}\right)(0,vrel ), lies in the spacelike 3 -space 20 20 ^(20){ }^{20}20 of observer U (see Fig. 2.5). This means that u v rel = 0 [ u v rel  = 0 u*v_("rel ")=0[:}\boldsymbol{u} \cdot \boldsymbol{v}_{\text {rel }}=0\left[\right.uvrel =0[ since u μ = ( 1 , 0 ) u μ = ( 1 , 0 ) u^(mu)=(1,0)u^{\mu}=(1,0)uμ=(1,0) and v rel μ = ( 0 , v rel ) ] v rel μ = 0 , v rel {:v_(rel)^(mu)=(0, vec(v)_(rel))]\left.v_{\mathrm{rel}}^{\mu}=\left(0, \vec{v}_{\mathrm{rel}}\right)\right]vrelμ=(0,vrel)]. Taking the scalar product of each side of eqn 2.39 with itself we have 1 / γ 2 = 1 / γ 2 = -1//gamma^(2)=-1 / \gamma^{2}=1/γ2= 1 + v rel 2 1 + v rel 2 -1+v_(rel)^(2)-1+v_{\mathrm{rel}}^{2}1+vrel2, or
(2.40) γ = ( 1 v rel 2 ) 1 2 (2.40) γ = 1 v rel 2 1 2 {:(2.40)gamma=(1-v_(rel)^(2))^(-(1)/(2)):}\begin{equation*} \gamma=\left(1-v_{\mathrm{rel}}^{2}\right)^{-\frac{1}{2}} \tag{2.40} \end{equation*}(2.40)γ=(1vrel2)12
as might be expected. The point is that we can take the scalar product of eqn 2.39 with u u u\boldsymbol{u}u to obtain
(2.41) u v = γ (2.41) u v = γ {:(2.41)u*v=-gamma:}\begin{equation*} \boldsymbol{u} \cdot \boldsymbol{v}=-\gamma \tag{2.41} \end{equation*}(2.41)uv=γ
which is a useful result.
(c) Momentum: From the definition of velocity u u u\boldsymbol{u}u, it's a short step to the momentum p = m u p = m u p=mu\boldsymbol{p}=m \boldsymbol{u}p=mu which then has invariant p p = m 2 p p = m 2 p*p=-m^(2)\boldsymbol{p} \cdot \boldsymbol{p}=-m^{2}pp=m2 and components p μ = ( γ m , γ m v ) = ( E , p ) p μ = ( γ m , γ m v ) = ( E , p ) p^(mu)=(gamma m,gamma m vec(v))=(E, vec(p))p^{\mu}=(\gamma m, \gamma m \vec{v})=(E, \vec{p})pμ=(γm,γmv)=(E,p) using E = γ m E = γ m E=gamma mE=\gamma mE=γm and p = γ m v p = γ m v vec(p)=gamma m vec(v)\vec{p}=\gamma m \vec{v}p=γmv.
Example 2.8
It's useful to remember that the 3-momentum p = γ m v p = γ m v vec(p)=gamma m vec(v)\vec{p}=\gamma m \vec{v}p=γmv and energy E = γ m E = γ m E=gamma mE=\gamma mE=γm are related via
(2.42) p = E v (2.42) p = E v {:(2.42) vec(p)=E vec(v):}\begin{equation*} \vec{p}=E \vec{v} \tag{2.42} \end{equation*}(2.42)p=Ev
Hence, the 4-momentum p = m u p = m u p=mu\boldsymbol{p}=m \boldsymbol{u}p=mu can also be written as
(2.43) p μ = ( E , E v x , E v y , E v z ) (2.43) p μ = E , E v x , E v y , E v z {:(2.43)p^(mu)=(E,Ev^(x),Ev^(y),Ev^(z)):}\begin{equation*} p^{\mu}=\left(E, E v^{x}, E v^{y}, E v^{z}\right) \tag{2.43} \end{equation*}(2.43)pμ=(E,Evx,Evy,Evz)
This is helpful as this expression also applies to massless particles such as the photon (whose velocity is a null vector). We therefore take this latter equation to be true for light, giving us an expression for the photon momentum.
(d) Force: Newton's second law may be written in terms of our new language as f = m d u d τ f = m d u d τ f=m((d)u)/((d)tau)\boldsymbol{f}=m \frac{\mathrm{~d} \boldsymbol{u}}{\mathrm{~d} \tau}f=m du dτ, or
(2.44) f = d p d τ (2.44) f = d p d τ {:(2.44)f=(dp)/((d)tau):}\begin{equation*} \boldsymbol{f}=\frac{\mathrm{d} \boldsymbol{p}}{\mathrm{~d} \tau} \tag{2.44} \end{equation*}(2.44)f=dp dτ
Note that since u u = 1 u u = 1 u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1uu=1, the result of differentiation with respect to τ τ tau\tauτ is
(2.45) d u d τ u + u d u d τ = 2 u d u d τ = 0 (2.45) d u d τ u + u d u d τ = 2 u d u d τ = 0 {:(2.45)(du)/((d)tau)*u+u*(du)/((d)tau)=2u*((d)u)/((d)tau)=0:}\begin{equation*} \frac{\mathrm{d} \boldsymbol{u}}{\mathrm{~d} \tau} \cdot \boldsymbol{u}+\boldsymbol{u} \cdot \frac{\mathrm{d} \boldsymbol{u}}{\mathrm{~d} \tau}=2 \boldsymbol{u} \cdot \frac{\mathrm{~d} \boldsymbol{u}}{\mathrm{~d} \tau}=0 \tag{2.45} \end{equation*}(2.45)du dτu+udu dτ=2u du dτ=0
or equivalently f u = 0 f u = 0 f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0fu=0. That is, the 4 -force is perpendicular to the 4 -velocity.

Example 2.9

The condition f u = 0 f u = 0 f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0fu=0 provides a useful relation if we recall that u μ = ( γ , γ v ) u μ = ( γ , γ v ) u^(mu)=(gamma,gamma vec(v))u^{\mu}=(\gamma, \gamma \vec{v})uμ=(γ,γv). Writing f μ = ( f 0 , f ) f μ = f 0 , f f^(mu)=(f^(0),( vec(f)))f^{\mu}=\left(f^{0}, \vec{f}\right)fμ=(f0,f), the dot product f u f u f*u\boldsymbol{f} \cdot \boldsymbol{u}fu yields
( f 0 , f ) ( γ , γ v ) = γ f 0 γ f v = 0 f 0 , f ( γ , γ v ) = γ f 0 γ f v = 0 (f^(0),( vec(f)))*(gamma,gamma vec(v))=gammaf^(0)-gamma vec(f)* vec(v)=0\left(f^{0}, \vec{f}\right) \cdot(\gamma, \gamma \vec{v})=\gamma f^{0}-\gamma \vec{f} \cdot \vec{v}=0(f0,f)(γ,γv)=γf0γfv=0
which implies
(2.46) f 0 = f v (2.46) f 0 = f v {:(2.46)f^(0)= vec(f)* vec(v):}\begin{equation*} f^{0}=\vec{f} \cdot \vec{v} \tag{2.46} \end{equation*}(2.46)f0=fv
Writing out f = d p d τ f = d p d τ f=(dp)/((d)tau)f=\frac{\mathrm{d} p}{\mathrm{~d} \tau}f=dp dτ in components
(2.47) f μ = ( d t d τ d p 0 d t d t d τ d p i d t ) (2.47) f μ = d t d τ d p 0 d t d t d τ d p i d t {:(2.47)f^(mu)=((dt)/((d)tau)*((d)p^(0))/((d)t)*((d)t)/((d)tau)*((d)p^(i))/((d)t)):}\begin{equation*} f^{\mu}=\left(\frac{\mathrm{d} t}{\mathrm{~d} \tau} \cdot \frac{\mathrm{~d} p^{0}}{\mathrm{~d} t} \cdot \frac{\mathrm{~d} t}{\mathrm{~d} \tau} \cdot \frac{\mathrm{~d} p^{i}}{\mathrm{~d} t}\right) \tag{2.47} \end{equation*}(2.47)fμ=(dt dτ dp0 dt dt dτ dpi dt)
20 20 ^(20){ }^{20}20 Here spacelike 3 -space simply means the parts of the space where vectors are written in terms of spacelike components v i v i v^(i)v^{i}vi with i = 1 , 2 , 3 i = 1 , 2 , 3 i=1,2,3i=1,2,3i=1,2,3. Equation 2.39 can be checked by substituting in the components given in the example.
21 21 ^(21){ }^{21}21 It is often thought that special relativity cannot treat acceleration because it only deals with inertial frames That is not the case. At each moment of time, an accelerating object can be thought of as in an instantaneous rest frame moving at speed v v vvv, but that speed varies along the trajectory.
22 22 ^(22){ }^{22}22 In words: the acceleration as mea sured in the rest frame, d v / d t d v / d t d vec(v)//dt\mathrm{d} \vec{v} / \mathrm{d} tdv/dt, sometimes known as the proper acceleration, is found by evaluating the invariant a 2 a 2 a^(2)\boldsymbol{a}^{2}a2, which gives precisely the square of this proper acceleration.
23 23 ^(23){ }^{23}23 The third scalar product in eqn 2.55 gives us a 0 a 0 + a 1 a 1 = g 2 a 0 a 0 + a 1 a 1 = g 2 -a^(0)a^(0)+a^(1)a^(1)=g^(2)-a^{0} a^{0}+a^{1} a^{1}=g^{2}a0a0+a1a1=g2 while the second one gives us a 1 = ( u 0 / u 1 ) a 0 a 1 = u 0 / u 1 a 0 a^(1)=(u^(0)//u^(1))a^(0)a^{1}=\left(u^{0} / u^{1}\right) a^{0}a1=(u0/u1)a0 Putting these together gives ( a 0 ) 2 [ 1 + a 0 2 [ 1 + (a^(0))^(2)[-1+\left(a^{0}\right)^{2}[-1+(a0)2[1+ ( u 0 / u 1 ) 2 ] = g 2 u 0 / u 1 2 = g 2 {:(u^(0)//u^(1))^(2)]=g^(2)\left.\left(u^{0} / u^{1}\right)^{2}\right]=g^{2}(u0/u1)2]=g2. Rearranging gives
a 0 = g u 1 u 1 u 1 + u 0 u 0 = g u 1 a 0 = g u 1 u 1 u 1 + u 0 u 0 = g u 1 a^(0)=(gu^(1))/(sqrt(-u^(1)u^(1)+u^(0)u^(0)))=gu^(1)a^{0}=\frac{g u^{1}}{\sqrt{-u^{1} u^{1}+u^{0} u^{0}}}=g u^{1}a0=gu1u1u1+u0u0=gu1
(using u 0 u 0 + u 1 u 1 = 1 u 0 u 0 + u 1 u 1 = 1 -u^(0)u^(0)+u^(1)u^(1)=-1-u^{0} u^{0}+u^{1} u^{1}=-1u0u0+u1u1=1 in the final step). The equation a 1 = g u 0 a 1 = g u 0 a^(1)=gu^(0)a^{1}=g u^{0}a1=gu0 is produced similarly. The final hyperbolic solution comes from differentiating eqn 2.56 with respect to τ τ tau\tauτ giving d 2 u 0 / d τ 2 = g 2 u 0 d 2 u 0 / d τ 2 = g 2 u 0 d^(2)u^(0)//dtau^(2)=g^(2)u^(0)\mathrm{d}^{2} u^{0} / \mathrm{d} \tau^{2}=g^{2} u^{0}d2u0/dτ2=g2u0 and d 2 u 1 / d τ 2 = d 2 u 1 / d τ 2 = d^(2)u^(1)//dtau^(2)=\mathrm{d}^{2} u^{1} / \mathrm{d} \tau^{2}=d2u1/dτ2= g 2 u 1 g 2 u 1 g^(2)u^(1)g^{2} u^{1}g2u1, and then choosing solutions so that at proper time τ = 0 τ = 0 tau=0\tau=0τ=0 we have t = 0 t = 0 t=0t=0t=0 and x x xxx is non-zero.
Fig. 2.6 The accelerated world line is a hyperbola.
and since d t / d τ = γ d t / d τ = γ dt//dtau=gamma\mathrm{d} t / \mathrm{d} \tau=\gammadt/dτ=γ, this simplifies to
(2.48) f μ = γ ( d p 0 d t , d p i d t ) (2.48) f μ = γ d p 0 d t , d p i d t {:(2.48)f^(mu)=gamma((dp^(0))/((d)t),((d)p^(i))/((d)t)):}\begin{equation*} f^{\mu}=\gamma\left(\frac{\mathrm{d} p^{0}}{\mathrm{~d} t}, \frac{\mathrm{~d} p^{i}}{\mathrm{~d} t}\right) \tag{2.48} \end{equation*}(2.48)fμ=γ(dp0 dt, dpi dt)
The 3-force F F vec(F)\vec{F}F is F = d p d t F = d p d t vec(F)=(d( vec(p)))/((d)t)\vec{F}=\frac{\mathrm{d} \vec{p}}{\mathrm{~d} t}F=dp dt, and hence we deduce that
(2.49) f μ = ( f 0 , γ F ) (2.49) f μ = f 0 , γ F {:(2.49)f^(mu)=(f^(0),gamma( vec(F))):}\begin{equation*} f^{\mu}=\left(f^{0}, \gamma \vec{F}\right) \tag{2.49} \end{equation*}(2.49)fμ=(f0,γF)
implying that f = γ F f = γ F vec(f)=gamma vec(F)\vec{f}=\gamma \vec{F}f=γF. Using eqn 2.46 we can write this as
(2.50) f μ = ( γ F v , γ F ) (2.50) f μ = ( γ F v , γ F ) {:(2.50)f^(mu)=(gamma vec(F)* vec(v)","gamma vec(F)):}\begin{equation*} f^{\mu}=(\gamma \vec{F} \cdot \vec{v}, \gamma \vec{F}) \tag{2.50} \end{equation*}(2.50)fμ=(γFv,γF)
This result is consistent with the power dissipated being given by d E / d t = d p 0 / d t = d E / d t = d p 0 / d t = dE//dt=dp^(0)//dt=\mathrm{d} E / \mathrm{d} t=\mathrm{d} p^{0} / \mathrm{d} t=dE/dt=dp0/dt= F v F v vec(F)* vec(v)\vec{F} \cdot \vec{v}Fv, familiar from classical mechanics.
Using f = m d u d τ f = m d u d τ f=m((d)u)/((d)tau)\boldsymbol{f}=m \frac{\mathrm{~d} \boldsymbol{u}}{\mathrm{~d} \tau}f=m du dτ, we can express Newton's first law in terms of the velocity 4 -vector and the proper time as
(2.51) d u d τ = 0 , or in component form d u μ d τ = 0 (2.51) d u d τ = 0 ,  or in component form  d u μ d τ = 0 {:(2.51)(du)/((d)tau)=0","quad" or in component form "quad(du^(mu))/(dtau)=0:}\begin{equation*} \frac{\mathrm{d} \boldsymbol{u}}{\mathrm{~d} \tau}=0, \quad \text { or in component form } \quad \frac{\mathrm{d} u^{\mu}}{\mathrm{d} \tau}=0 \tag{2.51} \end{equation*}(2.51)du dτ=0, or in component form duμdτ=0
(e) The acceleration 21 21 ^(21){ }^{21}21 is given by a = d u d τ a = d u d τ a=(du)/(dtau)\boldsymbol{a}=\frac{\mathrm{d} \boldsymbol{u}}{\mathrm{d} \tau}a=dudτ, and has components a μ = a μ = a^(mu)=a^{\mu}=aμ= ( a 0 , a ) a 0 , a (a^(0),( vec(a)))\left(a^{0}, \vec{a}\right)(a0,a). From eqn 2.45 , we have
(2.52) a u = 0 (2.52) a u = 0 {:(2.52)a*u=0:}\begin{equation*} \boldsymbol{a} \cdot \boldsymbol{u}=0 \tag{2.52} \end{equation*}(2.52)au=0
which implies a 0 = 0 a 0 = 0 a^(0)=0a^{0}=0a0=0 in the rest frame of the observer [where u μ = u μ = u^(mu)=u^{\mu}=uμ= ( 1 , 0 , 0 , 0 ) ] ( 1 , 0 , 0 , 0 ) ] (1,0,0,0)](1,0,0,0)](1,0,0,0)]. This means that, in the observer's instantaneous rest frame, 22 22 ^(22){ }^{22}22
(2.53) a a = ( d v d t ) 2 = | a | 2 (2.53) a a = d v d t 2 = | a | 2 {:(2.53)a*a=((d( vec(v)))/((d)t))^(2)=| vec(a)|^(2):}\begin{equation*} \boldsymbol{a} \cdot \boldsymbol{a}=\left(\frac{\mathrm{d} \vec{v}}{\mathrm{~d} t}\right)^{2}=|\vec{a}|^{2} \tag{2.53} \end{equation*}(2.53)aa=(dv dt)2=|a|2

Example 2.10

A body is subjected to uniform acceleration g g ggg in its instantaneous rest frame and g g ggg is applied along x 1 x 1 x^(1)x^{1}x1. In the instantaneous rest frame we have equations of motion
(2.54) d t d τ = u 0 , d x 1 d τ = u 1 , d u 0 d τ = a 0 , d u 1 d τ = a 1 (2.54) d t d τ = u 0 , d x 1 d τ = u 1 , d u 0 d τ = a 0 , d u 1 d τ = a 1 {:(2.54)(dt)/((d)tau)=u^(0)","quad((d)x^(1))/((d)tau)=u^(1)","quad((d)u^(0))/((d)tau)=a^(0)","quad((d)u^(1))/((d)tau)=a^(1):}\begin{equation*} \frac{\mathrm{d} t}{\mathrm{~d} \tau}=u^{0}, \quad \frac{\mathrm{~d} x^{1}}{\mathrm{~d} \tau}=u^{1}, \quad \frac{\mathrm{~d} u^{0}}{\mathrm{~d} \tau}=a^{0}, \quad \frac{\mathrm{~d} u^{1}}{\mathrm{~d} \tau}=a^{1} \tag{2.54} \end{equation*}(2.54)dt dτ=u0, dx1 dτ=u1, du0 dτ=a0, du1 dτ=a1
where t = x 0 t = x 0 t=x^(0)t=x^{0}t=x0. Taking scalar products tells us three things
(2.55) u u = 1 , u a = 0 , a a = g 2 (2.55) u u = 1 , u a = 0 , a a = g 2 {:(2.55)u*u=-1","quad u*a=0","quad a*a=g^(2):}\begin{equation*} \boldsymbol{u} \cdot \boldsymbol{u}=-1, \quad \boldsymbol{u} \cdot \boldsymbol{a}=0, \quad \boldsymbol{a} \cdot \boldsymbol{a}=g^{2} \tag{2.55} \end{equation*}(2.55)uu=1,ua=0,aa=g2
Solving, 23 23 ^(23){ }^{23}23 we obtain
(2.56) a 0 = d u 0 d τ = g u 1 , a 1 = d u 1 d τ = g u 0 (2.56) a 0 = d u 0 d τ = g u 1 , a 1 = d u 1 d τ = g u 0 {:(2.56)a^(0)=(du^(0))/((d)tau)=gu^(1)","quada^(1)=(du^(1))/((d)tau)=gu^(0):}\begin{equation*} a^{0}=\frac{\mathrm{d} u^{0}}{\mathrm{~d} \tau}=g u^{1}, \quad a^{1}=\frac{\mathrm{d} u^{1}}{\mathrm{~d} \tau}=g u^{0} \tag{2.56} \end{equation*}(2.56)a0=du0 dτ=gu1,a1=du1 dτ=gu0
The solutions to these equations are hyperbolic sines and cosines
(2.57) t = 1 g sinh g τ , x = 1 g cosh g τ (2.57) t = 1 g sinh g τ , x = 1 g cosh g τ {:(2.57)t=(1)/(g)sinh g tau","quad x=(1)/(g)cosh g tau:}\begin{equation*} t=\frac{1}{g} \sinh g \tau, \quad x=\frac{1}{g} \cosh g \tau \tag{2.57} \end{equation*}(2.57)t=1gsinhgτ,x=1gcoshgτ
We conclude that the accelerated world line is the hyperbola x 2 t 2 = g 2 x 2 t 2 = g 2 x^(2)-t^(2)=g^(-2)x^{2}-t^{2}=g^{-2}x2t2=g2 (see Fig. 2.6). The velocity along the world line is
(2.58) u 0 = d t d τ = cosh g τ , u 1 = d x 1 d τ = sinh g τ (2.58) u 0 = d t d τ = cosh g τ , u 1 = d x 1 d τ = sinh g τ {:(2.58)u^(0)=(dt)/((d)tau)=cosh g tau","quadu^(1)=(dx^(1))/((d)tau)=sinh g tau:}\begin{equation*} u^{0}=\frac{\mathrm{d} t}{\mathrm{~d} \tau}=\cosh g \tau, \quad u^{1}=\frac{\mathrm{d} x^{1}}{\mathrm{~d} \tau}=\sinh g \tau \tag{2.58} \end{equation*}(2.58)u0=dt dτ=coshgτ,u1=dx1 dτ=sinhgτ
which satisfies u u = u 0 u 0 + u 1 u 1 = 1 u u = u 0 u 0 + u 1 u 1 = 1 u*u=-u^(0)u^(0)+u^(1)u^(1)=-1\boldsymbol{u} \cdot \boldsymbol{u}=-u^{0} u^{0}+u^{1} u^{1}=-1uu=u0u0+u1u1=1. The particle's 3 -velocity is
(2.59) v = d x 1 d t = d x 1 / d τ d t / d τ = tanh g τ (2.59) v = d x 1 d t = d x 1 / d τ d t / d τ = tanh g τ {:(2.59)v=(dx^(1))/((d)t)=(dx^(1)//dtau)/((d)t//dtau)=tanh g tau:}\begin{equation*} v=\frac{\mathrm{d} x^{1}}{\mathrm{~d} t}=\frac{\mathrm{d} x^{1} / \mathrm{d} \tau}{\mathrm{~d} t / \mathrm{d} \tau}=\tanh g \tau \tag{2.59} \end{equation*}(2.59)v=dx1 dt=dx1/dτ dt/dτ=tanhgτ
This never exceeds v = 1 v = 1 v=1v=1v=1, but approaches it for τ = ± τ = ± tau=+-oo\tau= \pm \inftyτ=±. The 4 -acceleration is
(2.60) a 0 = g sinh g τ , a 1 = g cosh g τ (2.60) a 0 = g sinh g τ , a 1 = g cosh g τ {:(2.60)a^(0)=g sinh g tau","quada^(1)=g cosh g tau:}\begin{equation*} a^{0}=g \sinh g \tau, \quad a^{1}=g \cosh g \tau \tag{2.60} \end{equation*}(2.60)a0=gsinhgτ,a1=gcoshgτ
and the magnitude is | a | = ( a 0 ) 2 + ( a 1 ) 2 = g | a | = a 0 2 + a 1 2 = g |a|=sqrt(-(a^(0))^(2)+(a^(1))^(2))=g|\boldsymbol{a}|=\sqrt{-\left(a^{0}\right)^{2}+\left(a^{1}\right)^{2}}=g|a|=(a0)2+(a1)2=g. The 4 -force required for this acceleration is given by f μ = m a μ f μ = m a μ f^(mu)=ma^(mu)f^{\mu}=m a^{\mu}fμ=maμ.
(f) Particle current: This example refers to a cloud of particles. The particle current J = n 0 u J = n 0 u J=n_(0)u\boldsymbol{J}=n_{0} \boldsymbol{u}J=n0u, where n 0 n 0 n_(0)n_{0}n0 is the number of density of particles in their rest frame and u u u\boldsymbol{u}u is their velocity. In the rest-frame of the particles, we can write J μ = n 0 ( 1 , 0 , 0 , 0 ) J μ = n 0 ( 1 , 0 , 0 , 0 ) J^(mu)=n_(0)(1,0,0,0)J^{\mu}=n_{0}(1,0,0,0)Jμ=n0(1,0,0,0); in a general frame J μ = γ n 0 ( 1 , u ) J μ = γ n 0 ( 1 , u ) J^(mu)=gamman_(0)(1, vec(u))J^{\mu}=\gamma n_{0}(1, \vec{u})Jμ=γn0(1,u). The timelike component gives the number density n n nnn [and in a general frame the density increases according to n = γ n 0 n = γ n 0 n=gamman_(0)n=\gamma n_{0}n=γn0 because of Lorentz contraction (Fig. 2.7)]. The spacelike components give the flux of particles along that direction: e.g. J x = γ n 0 u x = n u x J x = γ n 0 u x = n u x J^(x)=gamman_(0)u^(x)=nu^(x)J^{x}=\gamma n_{0} u^{x}=n u^{x}Jx=γn0ux=nux is the number of particles crossing the y z y z yzy zyz plane in unit time.

2.4 Principle of least action

In the last section of this chapter, we shall show how some deep ideas in classical mechanics can be adapted for use in relativity. In contrast to using Newton's laws to work out now a system behaves, an alternative procedure was developed by the mathematician Joseph-Louis Lagrange to derive equations of motion from a variational principle. Thus, rather than starting with Newton's laws, we start with Hamilton's principle of least action, which we state below. The idea is to suppose that every mechanical system is characterized by a function called the Lagrangian, written 24 L ( q 1 , q 2 , , q n , q ˙ 1 , q ˙ 2 , , q ˙ n , t ) 24 L q 1 , q 2 , , q n , q ˙ 1 , q ˙ 2 , , q ˙ n , t ^(24)L(q_(1),q_(2),dots,q_(n),q^(˙)_(1),q^(˙)_(2),dots,q^(˙)_(n),t){ }^{24} L\left(q_{1}, q_{2}, \ldots, q_{n}, \dot{q}_{1}, \dot{q}_{2}, \ldots, \dot{q}_{n}, t\right)24L(q1,q2,,qn,q˙1,q˙2,,q˙n,t), which is a function of the positions q i q i q_(i)q_{i}qi of each of the n n nnn particles in the system, their velocities v = d q i / d t = q ˙ i v = d q i / d t = q ˙ i v=dq_(i)//dt=q^(˙)_(i)v=\mathrm{d} q_{i} / \mathrm{d} t=\dot{q}_{i}v=dqi/dt=q˙i and the time t t ttt. For simplicity we'll consider a single particle moving in one dimension, whose Lagrangian is then written L ( q , q ˙ , t ) L ( q , q ˙ , t ) L(q,q^(˙),t)L(q, \dot{q}, t)L(q,q˙,t). Consider the trajectory of this particle as it travels from point A A A\mathcal{A}A with coordinate q ( t 1 ) q t 1 q(t_(1))q\left(t_{1}\right)q(t1) at time t 1 t 1 t_(1)t_{1}t1 to a point B B B\mathcal{B}B with coordinate q ( t 2 ) q t 2 q(t_(2))q\left(t_{2}\right)q(t2) at time t 2 t 2 t_(2)t_{2}t2. The action for this trajectory is defined to be
(2.61) S = t 1 t 2 d t L ( q , q ˙ , t ) (2.61) S = t 1 t 2 d t L ( q , q ˙ , t ) {:(2.61)S=int_(t_(1))^(t_(2))dtL(q","q^(˙)","t):}\begin{equation*} S=\int_{t_{1}}^{t_{2}} \mathrm{~d} t L(q, \dot{q}, t) \tag{2.61} \end{equation*}(2.61)S=t1t2 dtL(q,q˙,t)
That is, we evaluate the Lagrangian at each point along the trajectory and add these up in the integral. What is the Lagrangian? In classical mechanics, Lagrange showed that it takes the form
(2.62) L = ( Kinetic energy ) ( Potential energy ) , (2.62) L = (  Kinetic energy  ) (  Potential energy  ) , {:(2.62)L=(" Kinetic energy ")-(" Potential energy ")",":}\begin{equation*} L=(\text { Kinetic energy })-(\text { Potential energy }), \tag{2.62} \end{equation*}(2.62)L=( Kinetic energy )( Potential energy ),
and we shall work this out in some particular cases in Example 2.11.
Hamilton's principle of least action says that the action, when describing the motion that actually takes place subject to the laws of physics, takes an extremal value (i.e. a maximum, stationary or minimum value). That is, if we find the path q ( t ) q ( t ) q(t)q(t)q(t) that extremizes the action, we have found the path the particle takes in travelling from A A A\mathcal{A}A to B B B\mathcal{B}B. There are many possible paths, and some of these are drawn in Fig. 2.8. Finding the path that extremizes the action is a simple problem in the calculus of variations (see Section 1.4) and from that we can immediately conclude that the equations of motion governing the motion of any particle in the Universe, is given by plugging the Lagrangian into the Euler-Lagrange equation.
Fig. 2.7 Length contraction increases the density of particles in a box owing to the shortening of the box length along the direction of travel.
Joseph-Louis Lagrange (1736-1813)
24 24 ^(24){ }^{24}24 We have assumed one-dimensional motion for each of the particles in this expression. In three dimensions, we write L ( q 1 , q 2 , , q n , q ˙ 1 , q ˙ 2 , , q ˙ n , t ) L q 1 , q 2 , , q n , q ˙ 1 , q ˙ 2 , , q ˙ n , t L( vec(q)_(1), vec(q)_(2),dots, vec(q)_(n), vec(q)^(˙)_(1), vec(q)^(˙)_(2),dots, vec(q)^(˙)_(n),t)L\left(\vec{q}_{1}, \vec{q}_{2}, \ldots, \vec{q}_{n}, \dot{\vec{q}}_{1}, \dot{\vec{q}}_{2}, \ldots, \dot{\vec{q}}_{n}, t\right)L(q1,q2,,qn,q˙1,q˙2,,q˙n,t).
Fig. 2.8 Different trajectories with δ q ( t 1 ) = δ q ( t 2 ) = 0 δ q t 1 = δ q t 2 = 0 delta q(t_(1))=delta q(t_(2))=0\delta q\left(t_{1}\right)=\delta q\left(t_{2}\right)=0δq(t1)=δq(t2)=0.
(2.63) d d t L q ˙ = L q . (2.63) d d t L q ˙ = L q . {:(2.63)(d)/((d)t)(del L)/(del(q^(˙)))=(del L)/(del q).:}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} t} \frac{\partial L}{\partial \dot{q}}=\frac{\partial L}{\partial q} . \tag{2.63} \end{equation*}(2.63)d dtLq˙=Lq.
If the motion is in several dimensions, and consequently described by several coordinates x i = ( x 1 , x 2 , x 3 ) x i = x 1 , x 2 , x 3 x^(i)=(x^(1),x^(2),x^(3)dots)x^{i}=\left(x^{1}, x^{2}, x^{3} \ldots\right)xi=(x1,x2,x3), we have an Euler-Lagrange equation for each coordinate, giving a set of equations of motion which are known as the Euler-Lagrange equations
(2.64) d d t L x ˙ i = L x i . (2.64) d d t L x ˙ i = L x i . {:(2.64)(d)/((d)t)(del L)/(delx^(˙)^(i))=(del L)/(delx^(i)).:}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} t} \frac{\partial L}{\partial \dot{x}^{i}}=\frac{\partial L}{\partial x^{i}} . \tag{2.64} \end{equation*}(2.64)d dtLx˙i=Lxi.
A solution can then be found by solving the entire set of equations. For our purposes, the fact that the Euler-Lagrange equations pick out the extremal values of the action S S SSS also makes them useful in a geometrical context. We now turn to some very simple examples and applications of this concept.

Example 2.11

  • A free, non-relativistic particle has kinetic energy 1 2 m x ˙ 2 1 2 m x ˙ 2 (1)/(2)mx^(˙)^(2)\frac{1}{2} m \dot{x}^{2}12mx˙2 and so has Lagrangian L = 1 2 m x ˙ 2 L = 1 2 m x ˙ 2 L=(1)/(2)mx^(˙)^(2)L=\frac{1}{2} m \dot{x}^{2}L=12mx˙2. The particle obeys the Euler-Lagrange equations and here this reduces to
(2.65) 0 = L x d d t L x ˙ = 0 d d t ( m x ˙ ) , (2.65) 0 = L x d d t L x ˙ = 0 d d t ( m x ˙ ) , {:(2.65)0=(del L)/(del x)-(d)/((d)t)(del L)/(del(x^(˙)))=0-(d)/((d)*t)(mx^(˙))",":}\begin{equation*} 0=\frac{\partial L}{\partial x}-\frac{\mathrm{d}}{\mathrm{~d} t} \frac{\partial L}{\partial \dot{x}}=0-\frac{\mathrm{d}}{\mathrm{~d} \cdot t}(m \dot{x}), \tag{2.65} \end{equation*}(2.65)0=Lxd dtLx˙=0d dt(mx˙),
and we obtain x ¨ = 0 x ¨ = 0 x^(¨)=0\ddot{x}=0x¨=0, implying that x ˙ x ˙ x^(˙)\dot{x}x˙ is a constant of the motion.
  • Repeating this in three dimensions means the Lagrangian is L = 1 2 m ( x ˙ 2 + L = 1 2 m x ˙ 2 + L=(1)/(2)m(x^(˙)^(2)+:}L=\frac{1}{2} m\left(\dot{x}^{2}+\right.L=12m(x˙2+ y ˙ 2 + z ˙ 2 y ˙ 2 + z ˙ 2 y^(˙)^(2)+z^(˙)^(2)\dot{y}^{2}+\dot{z}^{2}y˙2+z˙2 ) and we find that x ¨ = y ¨ = z ¨ = 0 x ¨ = y ¨ = z ¨ = 0 x^(¨)=y^(¨)=z^(¨)=0\ddot{x}=\ddot{y}=\ddot{z}=0x¨=y¨=z¨=0, implying that x ˙ , y ˙ x ˙ , y ˙ x^(˙),y^(˙)\dot{x}, \dot{y}x˙,y˙ and z ˙ z ˙ z^(˙)\dot{z}z˙ are each individually constant, and that v = ( x ˙ , y ˙ , z ˙ ) v = ( x ˙ , y ˙ , z ˙ ) vec(v)=(x^(˙),y^(˙),z^(˙))\vec{v}=(\dot{x}, \dot{y}, \dot{z})v=(x˙,y˙,z˙) is a constant vector. We each individually constant, and that v = ( x , y , z ) v = ( x , y , z ) v=(x,y,z)v=(x, y, z)v=(x,y,z) is a constant vector. We
    conclude that, in an inertial frame, free motion takes place with a velocity that is constant in magnitude and direction. This is known as the law of inertia.
  • For a particle in a potential V ( x ) V ( x ) V(x)V(x)V(x) we have L = 1 2 m x ˙ 2 V ( x ) L = 1 2 m x ˙ 2 V ( x ) L=(1)/(2)mx^(˙)^(2)-V(x)L=\frac{1}{2} m \dot{x}^{2}-V(x)L=12mx˙2V(x), and we obtain an equation of motion
(2.66) m x ¨ = V x (2.66) m x ¨ = V x {:(2.66)mx^(¨)=-(del V)/(del x):}\begin{equation*} m \ddot{x}=-\frac{\partial V}{\partial x} \tag{2.66} \end{equation*}(2.66)mx¨=Vx
which expresses Newton's second law.
  • For a simple harmonic oscillator, V ( x ) = 1 2 k x 2 V ( x ) = 1 2 k x 2 V(x)=(1)/(2)kx^(2)V(x)=\frac{1}{2} k x^{2}V(x)=12kx2, and so L = 1 2 m v 2 1 2 k x 2 L = 1 2 m v 2 1 2 k x 2 L=(1)/(2)mv^(2)-(1)/(2)kx^(2)L=\frac{1}{2} m v^{2}-\frac{1}{2} k x^{2}L=12mv212kx2, giving an equation of motion m x ¨ = k x m x ¨ = k x mx^(¨)=-kxm \ddot{x}=-k xmx¨=kx.
  • For Newtonian gravitation
(2.67) L = 1 2 m ( x ˙ + y ˙ 2 + z ˙ 2 ) + G M m | r | (2.67) L = 1 2 m x ˙ + y ˙ 2 + z ˙ 2 + G M m | r | {:(2.67)L=(1)/(2)m((x^(˙))+y^(˙)^(2)+z^(˙)^(2))+(GMm)/(|( vec(r))|):}\begin{equation*} L=\frac{1}{2} m\left(\dot{x}+\dot{y}^{2}+\dot{z}^{2}\right)+\frac{G M m}{|\vec{r}|} \tag{2.67} \end{equation*}(2.67)L=12m(x˙+y˙2+z˙2)+GMm|r|
where r 2 = x 2 + y 2 + z 2 r 2 = x 2 + y 2 + z 2 r^(2)=x^(2)+y^(2)+z^(2)r^{2}=x^{2}+y^{2}+z^{2}r2=x2+y2+z2. We derive equations of motion
(2.68) x ¨ = G M x r 3 , y ¨ = G M y r 3 , z ¨ = G M z r 3 . (2.68) x ¨ = G M x r 3 , y ¨ = G M y r 3 , z ¨ = G M z r 3 . {:(2.68)x^(¨)=-GM(x)/(r^(3))","quady^(¨)=-GM(y)/(r^(3))","quadz^(¨)=-GM(z)/(r^(3)).:}\begin{equation*} \ddot{x}=-G M \frac{x}{r^{3}}, \quad \ddot{y}=-G M \frac{y}{r^{3}}, \quad \ddot{z}=-G M \frac{z}{r^{3}} . \tag{2.68} \end{equation*}(2.68)x¨=GMxr3,y¨=GMyr3,z¨=GMzr3.
Hamilton's principle of least action is closely related to Fermat's principle of least time, the idea that light chooses a route that minimizes the travel time. This gives us a way of thinking about the Lagrangian for a relativistic particle in flat spacetime. We write the action as
(2.69) S = m d τ (2.69) S = m d τ {:(2.69)S=-m intdtau:}\begin{equation*} S=-m \int \mathrm{~d} \tau \tag{2.69} \end{equation*}(2.69)S=m dτ
This is mass m m mmm (which is an energy m c 2 m c 2 mc^(2)m c^{2}mc2 if you put the factors of c c ccc back in) multiplied by the path length in time. There is also a minus sign which expresses that when we minimize S S SSS we maximize d τ d τ intdtau\int \mathrm{d} \taudτ (and we have already argued from the twin paradox that the straight-line path is the longest, not shortest path). As we shall see, these choices give us the correct dynamics. First note that since d τ = d t / γ d τ = d t / γ dtau=dt//gamma\mathrm{d} \tau=\mathrm{d} t / \gammadτ=dt/γ, eqn 2.69 gives us a very simple Lagrangian 25 25 ^(25){ }^{25}25
(2.70) L = m γ (2.70) L = m γ {:(2.70)L=-(m)/( gamma):}\begin{equation*} L=-\frac{m}{\gamma} \tag{2.70} \end{equation*}(2.70)L=mγ

Example 2.12

More usefully for later discussion, we can rewrite the action in eqn 2.69 in terms of the components of the metric, so that
(2.71) S = m d s 2 = m ( η μ ν d x μ d x ν ) 1 2 (2.71) S = m d s 2 = m η μ ν d x μ d x ν 1 2 {:(2.71)S=-m intsqrt(-ds^(2))=-m int(-eta_(mu nu)dx^(mu)dx^(nu))^((1)/(2)):}\begin{equation*} S=-m \int \sqrt{-\mathrm{d} s^{2}}=-m \int\left(-\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right)^{\frac{1}{2}} \tag{2.71} \end{equation*}(2.71)S=mds2=m(ημνdxμdxν)12
We can show that the Lagrangian we have identified in eqn 2.70 makes sense by taking a non-relativistic limit for small velocities
(2.72) L = m γ = m [ 1 1 2 v 2 + ] = m + 1 2 m v 2 (2.72) L = m γ = m 1 1 2 v 2 + = m + 1 2 m v 2 {:(2.72)L=-(m)/( gamma)=-m[1-(1)/(2)v^(2)+dots]=-m+(1)/(2)mv^(2)-dots:}\begin{equation*} L=-\frac{m}{\gamma}=-m\left[1-\frac{1}{2} v^{2}+\ldots\right]=-m+\frac{1}{2} m v^{2}-\ldots \tag{2.72} \end{equation*}(2.72)L=mγ=m[112v2+]=m+12mv2
Ignoring the constant m m -m-mm, which vanishes as soon as we differentiate the Lagrangian, we have the Lagrangian L = 1 2 m v 2 L = 1 2 m v 2 L=(1)/(2)mv^(2)L=\frac{1}{2} m v^{2}L=12mv2 for a free particle in Newtonian mechanics. Having passed this test, we realize that eqn 2.70 is a fully working Lagrangian for free particles in special relativity: L = L = L=L=L= m / γ = m ( 1 v 2 ) 1 2 m / γ = m 1 v 2 1 2 -m//gamma=-m(1-v^(2))^((1)/(2))-m / \gamma=-m\left(1-v^{2}\right)^{\frac{1}{2}}m/γ=m(1v2)12. To analyse the mechanics of a system we need to deal with the forces that act on particles owing to the potential V V VVV that they feel. In Newtonian physics, we would simple write L = 1 2 m v 2 V L = 1 2 m v 2 V L=(1)/(2)mv^(2)-VL=\frac{1}{2} m v^{2}-VL=12mv2V. However, the role of the potential energy in relativity turns out to be more subtle and depends on which potential we are considering.

Example 2.13

Inserting a potential Φ Φ Phi\PhiΦ into the Lagrangian in eqn 2.72 would give L = m + 1 2 m v 2 L = m + 1 2 m v 2 L=-m+(1)/(2)mv^(2)-L=-m+\frac{1}{2} m v^{2}-L=m+12mv2 m Φ + m Φ + m Phi+cdotsm \Phi+\cdotsmΦ+, so that the action S = L d t = m d s 2 S = L d t = m d s 2 S=int Ldt=-m intsqrt(-ds^(2))S=\int L \mathrm{~d} t=-m \int \sqrt{-\mathrm{d} s^{2}}S=L dt=mds2, and this would imply
(2.73) d s 2 = ( 1 1 2 v 2 + Φ ) d t . (2.73) d s 2 = 1 1 2 v 2 + Φ d t . {:(2.73)sqrt(-ds^(2))=(1-(1)/(2)*v^(2)+Phi)dt.:}\begin{equation*} \sqrt{-\mathrm{d} s^{2}}=\left(1-\frac{1}{2} \cdot v^{2}+\Phi\right) \mathrm{d} t . \tag{2.73} \end{equation*}(2.73)ds2=(112v2+Φ)dt.
Hence, using v 2 = ( d x / d t ) 2 + ( d y / d t ) 2 + ( d z / d t ) 2 v 2 = ( d x / d t ) 2 + ( d y / d t ) 2 + ( d z / d t ) 2 v^(2)=(dx//dt)^(2)+(dy//dt)^(2)+(dz//dt)^(2)v^{2}=(\mathrm{d} x / \mathrm{d} t)^{2}+(\mathrm{d} y / \mathrm{d} t)^{2}+(\mathrm{d} z / \mathrm{d} t)^{2}v2=(dx/dt)2+(dy/dt)2+(dz/dt)2 we have (to leading order ) 26 ) 26 )^(26))^{26})26
(2.75) d s 2 = ( 1 + 2 Φ ) d t 2 + d x 2 + d y 2 + d z 2 (2.75) d s 2 = ( 1 + 2 Φ ) d t 2 + d x 2 + d y 2 + d z 2 {:(2.75)ds^(2)=-(1+2Phi)dt^(2)+dx^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=-(1+2 \Phi) \mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{2.75} \end{equation*}(2.75)ds2=(1+2Φ)dt2+dx2+dy2+dz2
This is our first hint that a gravitational potential can change the metric of spacetime. However, this expression has been obtained in a non-relativistic limit so one should exercise some caution. 27 27 ^(27){ }^{27}27
25 25 ^(25){ }^{25}25 Hence, S = L d t = m d τ S = L d t = m d τ S=int Ldt=-m intdtauS=\int L \mathrm{~d} t=-m \int \mathrm{~d} \tauS=L dt=m dτ as re-
quired. quired.
26 26 ^(26){ }^{26}26 That is, we write
d s 2 = [ ( 1 + Φ ) v 2 2 ] 2 d t 2 = [ ( 1 + Φ ) 2 ( 1 + Φ ) v 2 + v 4 4 ] d t 2 = [ ( 1 + 2 Φ + Φ 2 ) v 2 v 2 Φ + v 4 4 ] d t 2 , d s 2 = ( 1 + Φ ) v 2 2 2 d t 2 = ( 1 + Φ ) 2 ( 1 + Φ ) v 2 + v 4 4 d t 2 = 1 + 2 Φ + Φ 2 v 2 v 2 Φ + v 4 4 d t 2 , {:[ds^(2)=-[(1+Phi)-(v^(2))/(2)]^(2)dt^(2)],[=-[(1+Phi)^(2)-(1+Phi)v^(2)+(v^(4))/(4)]dt^(2)],[=-[(1+2Phi+Phi^(2)):}],[{: quadquad-v^(2)-v^(2)Phi+(v^(4))/(4)*]dt^(2)","]:}\begin{aligned} & \mathrm{d} s^{2}=-\left[(1+\Phi)-\frac{v^{2}}{2}\right]^{2} \mathrm{~d} t^{2} \\ & =-\left[(1+\Phi)^{2}-(1+\Phi) v^{2}+\frac{v^{4}}{4}\right] \mathrm{d} t^{2} \\ & =-\left[\left(1+2 \Phi+\Phi^{2}\right)\right. \\ & \left.\quad \quad-v^{2}-v^{2} \Phi+\frac{v^{4}}{4} \cdot\right] \mathrm{d} t^{2}, \end{aligned}ds2=[(1+Φ)v22]2 dt2=[(1+Φ)2(1+Φ)v2+v44]dt2=[(1+2Φ+Φ2)v2v2Φ+v44]dt2,
and drop higher order terms proportional to Φ 2 , v 2 Φ Φ 2 , v 2 Φ Phi^(2),v^(2)Phi\Phi^{2}, v^{2} \PhiΦ2,v2Φ and v 4 v 4 v^(4)v^{4}v4.
27 27 ^(27){ }^{27}27 The v c v c v≪cv \ll cvc limit used in eqn 2.72 to derive eqn 2.75 means that this expression is only expected to work for timelike intervals close to the time axis, and so one might suspect that the spatial coordinates in eqn 2.75 might need tweaking due to the effects of the potential. That turns out to be the case (see eqn 5.22).
This brings us on to a very important idea. In order to describe gravitation, Einstein did not simply put together a potential to contribute to a Lagrangian, but instead showed how gravitation changes the very properties of spacetime itself. This is done by having matter directly affect the metric and, by extension, the basic rules governing the intervals in spacetime. In general relativity, the action S = m ( η μ ν d x μ d x ν ) 1 2 S = m η μ ν d x μ d x ν 1 2 S=-m int(-eta_(mu nu)dx^(mu)dx^(nu))^((1)/(2))S=-m \int\left(-\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right)^{\frac{1}{2}}S=m(ημνdxμdxν)12 becomes
(2.76) S = m ( g μ ν d x μ d x ν ) 1 2 (2.76) S = m g μ ν d x μ d x ν 1 2 {:(2.76)S=-m int(-g_(mu nu)dx^(mu)dx^(nu))^((1)/(2)):}\begin{equation*} S=-m \int\left(-g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right)^{\frac{1}{2}} \tag{2.76} \end{equation*}(2.76)S=m(gμνdxμdxν)12
where the gravitating masses determine the form of the metric g μ ν g μ ν g_(mu nu)g_{\mu \nu}gμν (which, as we shall see, can then be different from the flat spacetime metric η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν ).

Chapter summary

  • Vectors in special relativity are four-dimensional. Their components can be transformed by the Lorentz transformations, but scalar products are invariant.
  • The Minkowski metric tensor, with components η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν allows us to make scalar products in flat spacetime.
  • The relativistic action is given in terms of the interval d s d s ds\mathrm{d} sds.
  • The velocity vector is tangent to the world line of a particle.
  • An observer with 4 -velocity u u u\boldsymbol{u}u measures the timelike component of 4 -vector X X X\boldsymbol{X}X to be X u X u -X*u-\boldsymbol{X} \cdot \boldsymbol{u}Xu.

Exercises

(2.1) Show that an observer travelling with velocity 4vector u u u\boldsymbol{u}u will deduce that, in their frame, the energy of a particle with 4 -vector p p p\boldsymbol{p}p is E = p u E = p u E=-p*uE=-\boldsymbol{p} \cdot \boldsymbol{u}E=pu.
(2.2) Show that eqn 2.41 ( u v = γ ) 2.41 ( u v = γ ) 2.41(u*v=-gamma)2.41(\boldsymbol{u} \cdot \boldsymbol{v}=-\gamma)2.41(uv=γ) implies that
(2.77) γ = γ ( u ) γ ( v ) ( 1 u v ) (2.77) γ = γ ( u ) γ ( v ) ( 1 u v ) {:(2.77)gamma=gamma( vec(u))gamma( vec(v))(1- vec(u)* vec(v)):}\begin{equation*} \gamma=\gamma(\vec{u}) \gamma(\vec{v})(1-\vec{u} \cdot \vec{v}) \tag{2.77} \end{equation*}(2.77)γ=γ(u)γ(v)(1uv)
and that in the non-relativistic limit this reduces to v rel = u v v rel  = u v vec(v)_("rel ")= vec(u)- vec(v)\vec{v}_{\text {rel }}=\vec{u}-\vec{v}vrel =uv or v rel = v u v rel  = v u vec(v)_("rel ")= vec(v)- vec(u)\vec{v}_{\text {rel }}=\vec{v}-\vec{u}vrel =vu. Interpret these results physically.
(2.3) The momentum vector can be written in components as
(2.78) p μ = ( E , p ) (2.78) p μ = ( E , p ) {:(2.78)p^(mu)=(E"," vec(p)):}\begin{equation*} p^{\mu}=(E, \vec{p}) \tag{2.78} \end{equation*}(2.78)pμ=(E,p)
We can define the same object with a downstairs index by p μ = η μ ν p ν p μ = η μ ν p ν p_(mu)=eta_(mu nu)p^(nu)p_{\mu}=\eta_{\mu \nu} p^{\nu}pμ=ημνpν. Show that
(2.79) p μ = ( E , p ) (2.79) p μ = ( E , p ) {:(2.79)p_(mu)=(-E"," vec(p)):}\begin{equation*} p_{\mu}=(-E, \vec{p}) \tag{2.79} \end{equation*}(2.79)pμ=(E,p)
Show also that
p p = η μ ν p μ p ν = p ν p ν = E 2 + p 2 = m 2 . ( 2.80 ) p p = η μ ν p μ p ν = p ν p ν = E 2 + p 2 = m 2 . ( 2.80 ) p*p=eta_(mu nu)p^(mu)p^(nu)=p_(nu)p^(nu)=-E^(2)+p^(2)=-m^(2).(2.80)\boldsymbol{p} \cdot \boldsymbol{p}=\eta_{\mu \nu} p^{\mu} p^{\nu}=p_{\nu} p^{\nu}=-E^{2}+p^{2}=-m^{2} .(2.80)pp=ημνpμpν=pνpν=E2+p2=m2.(2.80)
The gradient operator has components
(2.81) μ = ( t , ) (2.81) μ = t , {:(2.81)del_(mu)=((del)/(del t),( vec(grad))):}\begin{equation*} \partial_{\mu}=\left(\frac{\partial}{\partial t}, \vec{\nabla}\right) \tag{2.81} \end{equation*}(2.81)μ=(t,)
Prove the following relations:
(i) μ = ( / t , ) μ = ( / t , ) del^(mu)=(-del//del t, vec(grad))\partial^{\mu}=(-\partial / \partial t, \vec{\nabla})μ=(/t,);
(ii) 2 = μ μ = 2 / t 2 + 2 2 = μ μ = 2 / t 2 + 2 del^(2)=del_(mu)del^(mu)=-del^(2)//delt^(2)+ vec(grad)^(2)\partial^{2}=\partial_{\mu} \partial^{\mu}=-\partial^{2} / \partial t^{2}+\vec{\nabla}^{2}2=μμ=2/t2+2;
(iii) μ J μ = ρ / t + J μ J μ = ρ / t + J del_(mu)J^(mu)=-del rho//del t+ vec(grad)* vec(J)\partial_{\mu} J^{\mu}=-\partial \rho / \partial t+\vec{\nabla} \cdot \vec{J}μJμ=ρ/t+J; (iv) a μ u μ = 0 a μ u μ = 0 a_(mu)u^(mu)=0a_{\mu} u^{\mu}=0aμuμ=0.
(2.4) The Galaxy is about 10 5 10 5 10^(5)10^{5}105 light years across and the most energetic cosmic rays known have energies of the order of 10 19 eV 10 19 eV 10^(19)eV10^{19} \mathrm{eV}1019eV. How long would it take a proton (rest mass 1 GeV 1 GeV ~~1GeV\approx 1 \mathrm{GeV}1GeV ) with this energy to cross the Galaxy as measured in the rest frame of (i) the Galaxy and (ii) the proton?
(2.5) In its rest frame, a particle of mass m m mmm will have 4 -vector p μ = ( m , 0 , 0 , 0 ) p μ = ( m , 0 , 0 , 0 ) p^(mu)=(m,0,0,0)p^{\mu}=(m, 0,0,0)pμ=(m,0,0,0). Using the Lorentz transformation on this 4 -vector, find the energy and momentum of a particle in a frame moving so that the particle has speed v v vvv. Check that the original and
transformed 4 -vector components give the same invariant.
(2.6) Use 4 -vectors to show that an electron in free space cannot absorb a single photon.
(2.7) Using the principle of least action, calculate the shape traced out by a hanging string.
(2.8) The generalized momentum in Lagrangian mechanics is p = L / v p = L / v vec(p)=del L//del vec(v)\vec{p}=\partial L / \partial \vec{v}p=L/v. Show that with L = m / γ L = m / γ L=-m//gammaL=-m / \gammaL=m/γ, this yields p = m γ v p = m γ v vec(p)=m gamma vec(v)\vec{p}=m \gamma \vec{v}p=mγv.
(2.9) The Hamiltonian H H HHH in Lagrangian mechanics is given by H = p v L H = p v L H= vec(p)* vec(v)-LH=\vec{p} \cdot \vec{v}-LH=pvL where p p vec(p)\vec{p}p is the momentum and v v vec(v)\vec{v}v is the velocity. Show that this agrees with E = γ m c 2 E = γ m c 2 E=gamma mc^(2)E=\gamma m c^{2}E=γmc2.

3.1 Coordinates in Euclidean space 36 3.2 Farewell to the position vector
3.3 Non-Euclidean space 40 Chapter summary 41 Exercises

Coordinates

I'll put a girdle round about the earth in forty minutes William Shakespeare A Midsummer Night's Dream (Act II, Scene I)
Do we actually need coordinates? In many cases, we are better off not using them. Take a relationship like the one that expresses Newton's second law
(3.1) f = d p d τ (3.1) f = d p d τ {:(3.1)f=(dp)/((d)tau):}\begin{equation*} f=\frac{\mathrm{d} \boldsymbol{p}}{\mathrm{~d} \tau} \tag{3.1} \end{equation*}(3.1)f=dp dτ
This is a relationship between two vectors (arrows in spacetime) and holds irrespective of the coordinates chosen. Yes, you could choose a frame in which you can write f μ = d p μ / d τ f μ = d p μ / d τ f^(mu)=dp^(mu)//dtauf^{\mu}=\mathrm{d} p^{\mu} / \mathrm{d} \taufμ=dpμ/dτ, but if you transform into a second frame which is rotating with respect to the first, then eqn 3.1 will emerge with extra terms (centrifugal and Coriolis forces) and will look a lot more complicated, even though the same physics is being described. It's much better, whenever possible, to stay above the fray and focus on a coordinate-free approach in which you only deal with statements expressed in purely geometrical terms.
However, particular physical problems have a very nasty habit of requiring us to dive back down into the murky world of coordinates, rather than staying aloof in our heavenly geometrical realm. Sometimes, like Puck in A Midsummer Night's Dream, we need to put a coordinate girdle around the Earth. There are often occasions when we need coordinates to express the value quantities take in particular frames, or to allow us to exploit a particular symmetry. Therefore, in this chapter, we develop a few ideas about coordinates and, for a start, we will need to define some terms: Euclidean space, Cartesian coordinates, coordinate and non-coordinate bases, non-Euclidean space.

3.1 Coordinates in Euclidean space

Euclidean space is a set of points in n n nnn-dimensions in which the scalar product between two vectors X X X\boldsymbol{X}X and Y Y Y\boldsymbol{Y}Y is 1 X Y = μ X μ Y μ 1 X Y = μ X μ Y μ ^(1)X*Y=sum_(mu)X^(mu)Y^(mu){ }^{1} \boldsymbol{X} \cdot \boldsymbol{Y}=\sum_{\mu} X^{\mu} Y^{\mu}1XY=μXμYμ, so that the length of a vector X X X\boldsymbol{X}X is | X | = X X = ( μ X μ X μ ) 1 2 | X | = X X = μ X μ X μ 1 2 |X|=sqrt(X*X)=(sum_(mu)X^(mu)X^(mu))^((1)/(2))|\boldsymbol{X}|=\sqrt{\boldsymbol{X} \cdot \boldsymbol{X}}=\left(\sum_{\mu} X^{\mu} X^{\mu}\right)^{\frac{1}{2}}|X|=XX=(μXμXμ)12 and the angle between X X X\boldsymbol{X}X and Y Y Y\boldsymbol{Y}Y is cos 1 ( X Y / | X Y | ) cos 1 ( X Y / | X Y | ) cos^(-1)(X*Y//|X||Y|)\cos ^{-1}(\boldsymbol{X} \cdot \boldsymbol{Y} /|\boldsymbol{X} \| \boldsymbol{Y}|)cos1(XY/|XY|). In other words, it is the flat space you have been using all your life, equipped with geometric axioms that date back to Euclid's Elements around 300 BC Euclid focussed on the geometry of the plane ( n = 2 n = 2 n=2n=2n=2 ), showing that the
sum of the internal angles in a triangle adds up to 180 180 180^(@)180^{\circ}180 and so forth, and so we will also choose n = 2 n = 2 n=2n=2n=2 for now.
Euclidean space is most often described using Cartesian coordinates, an innovation of René Descartes in 1637 and so, following his example, we describe any point in the plane using two numbers x x xxx and y y yyy that encode where on the Cartesian plane in Fig. 3.1(a) a particular point happens to lie. Of course, that's not the only way of describing the Euclidean plane. Polar coordinates 2 2 ^(2){ }^{2}2 are another option where the same point can be described by a distance r r rrr from the origin and an angle θ θ theta\thetaθ, as shown in Fig. 3.1(b).
These two sets of coordinates are related by the familiar equations
(3.2) x = r cos θ , y = r sin θ , r = ( x 2 + y 2 ) 1 2 , tan θ = y / x (3.2) x = r cos θ , y = r sin θ , r = x 2 + y 2 1 2 , tan θ = y / x {:(3.2)x=r cos theta","quad y=r sin theta","quad r=(x^(2)+y^(2))^((1)/(2))","quad tan theta=y//x:}\begin{equation*} x=r \cos \theta, \quad y=r \sin \theta, \quad r=\left(x^{2}+y^{2}\right)^{\frac{1}{2}}, \quad \tan \theta=y / x \tag{3.2} \end{equation*}(3.2)x=rcosθ,y=rsinθ,r=(x2+y2)12,tanθ=y/x
To express the components of a vector X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ in terms of the new coordinates we can use the formula for coordinate transformations (eqn 2.11, reserved until now for Lorentz transformations)
(3.3) X μ = Λ ν μ X ν = ( x μ x ν ) X ν (3.3) X μ = Λ ν μ X ν = x μ x ν X ν {:(3.3)X^(mu^('))=Lambda_(nu)^(mu^('))X^(nu)=((delx^(mu^(')))/(delx^(nu)))X^(nu):}\begin{equation*} X^{\mu^{\prime}}=\Lambda_{\nu}^{\mu^{\prime}} X^{\nu}=\left(\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}}\right) X^{\nu} \tag{3.3} \end{equation*}(3.3)Xμ=ΛνμXν=(xμxν)Xν
as demonstrated in the following example.
Example 3.1
We will consider the transformation between two-dimensional Cartesian coordinates and polar coordinates on the components of the infinitesimal-displacement vector d x = d x μ e μ d x = d x μ e μ dx=dx^(mu)e_(mu)\mathrm{d} \boldsymbol{x}=\mathrm{d} x^{\mu} \boldsymbol{e}_{\mu}dx=dxμeμ with components d x μ d x μ dx^(mu)\mathrm{d} x^{\mu}dxμ. For the unprimed coordinates we write x μ = x μ = x^(mu)=x^{\mu}=xμ= ( x , y ) ( x , y ) (x,y)(x, y)(x,y). For the primed coordinates we write x α = ( r , θ ) x α = ( r , θ ) x^(alpha^('))=(r,theta)x^{\alpha^{\prime}}=(r, \theta)xα=(r,θ). The transformations are given by a matrix formed from the partial derivatives computed below
(3.4) ( x r ) θ = cos θ , ( y r ) θ = sin θ , ( x θ ) r = r sin θ , ( y θ ) r = r cos θ , (3.4) x r θ = cos θ , y r θ = sin θ , x θ r = r sin θ , y θ r = r cos θ , {:(3.4){:[((del x)/(del r))_(theta)=cos theta",",((del y)/(del r))_(theta)=sin theta","],[((del x)/(del theta))_(r)=-r sin theta",",((del y)/(del theta))_(r)=r cos theta","]:}:}\begin{array}{ll} \left(\frac{\partial x}{\partial r}\right)_{\theta}=\cos \theta, & \left(\frac{\partial y}{\partial r}\right)_{\theta}=\sin \theta, \\ \left(\frac{\partial x}{\partial \theta}\right)_{r}=-r \sin \theta, & \left(\frac{\partial y}{\partial \theta}\right)_{r}=r \cos \theta, \tag{3.4} \end{array}(3.4)(xr)θ=cosθ,(yr)θ=sinθ,(xθ)r=rsinθ,(yθ)r=rcosθ,
and
(3.5) ( r x ) y = x x 2 + y 2 , ( r y ) x = y x 2 + y 2 ( θ x ) y = y x 2 + y 2 , ( θ y ) x = x x 2 + y 2 (3.5) r x y = x x 2 + y 2 , r y x = y x 2 + y 2 θ x y = y x 2 + y 2 , θ y x = x x 2 + y 2 {:[(3.5)((del r)/(del x))_(y)=(x)/(sqrt(x^(2)+y^(2)))","quad((del r)/(del y))_(x)=(y)/(sqrt(x^(2)+y^(2)))],[((del theta)/(del x))_(y)=-(y)/(x^(2)+y^(2))","quad((del theta)/(del y))_(x)=(x)/(x^(2)+y^(2))]:}\begin{align*} & \left(\frac{\partial r}{\partial x}\right)_{y}=\frac{x}{\sqrt{x^{2}+y^{2}}}, \quad\left(\frac{\partial r}{\partial y}\right)_{x}=\frac{y}{\sqrt{x^{2}+y^{2}}} \tag{3.5}\\ & \left(\frac{\partial \theta}{\partial x}\right)_{y}=-\frac{y}{x^{2}+y^{2}}, \quad\left(\frac{\partial \theta}{\partial y}\right)_{x}=\frac{x}{x^{2}+y^{2}} \end{align*}(3.5)(rx)y=xx2+y2,(ry)x=yx2+y2(θx)y=yx2+y2,(θy)x=xx2+y2
We can write the derivatives in matrix form d x α = ( x α / x μ ) d x μ d x α = x α / x μ d x μ dx^(alpha^('))=(delx^(alpha^('))//delx^(mu))dx^(mu)\mathrm{d} x^{\alpha^{\prime}}=\left(\partial x^{\alpha^{\prime}} / \partial x^{\mu}\right) \mathrm{d} x^{\mu}dxα=(xα/xμ)dxμ, which gives a transformation matrix
( d r d θ ) = ( ( r x ) y ( r y ) x ( θ x ) y ( θ y ) x ) ( d x d y ) (3.6) = ( cos θ sin θ 1 r sin θ 1 r cos θ ) ( d x d y ) ( d r d θ ) = r x y r y x θ x y θ y x ( d x d y ) (3.6) = cos θ sin θ 1 r sin θ 1 r cos θ ( d x d y ) {:[((dr)/((d)theta))=([((del r)/(del x))_(y),((del r)/(del y))_(x)],[((del theta)/(del x))_(y),((del theta)/(del y))_(x)])((dx)/((d)y))],[(3.6)=([cos theta,sin theta],[-(1)/(r)sin theta,(1)/(r)cos theta])((dx)/((d)y))]:}\begin{align*} \binom{\mathrm{d} r}{\mathrm{~d} \theta} & =\left(\begin{array}{cc} \left(\frac{\partial r}{\partial x}\right)_{y} & \left(\frac{\partial r}{\partial y}\right)_{x} \\ \left(\frac{\partial \theta}{\partial x}\right)_{y} & \left(\frac{\partial \theta}{\partial y}\right)_{x} \end{array}\right)\binom{\mathrm{d} x}{\mathrm{~d} y} \\ & =\left(\begin{array}{cc} \cos \theta & \sin \theta \\ -\frac{1}{r} \sin \theta & \frac{1}{r} \cos \theta \end{array}\right)\binom{\mathrm{d} x}{\mathrm{~d} y} \tag{3.6} \end{align*}(dr dθ)=((rx)y(ry)x(θx)y(θy)x)(dx dy)(3.6)=(cosθsinθ1rsinθ1rcosθ)(dx dy)
where we have reexpressed the terms in the matrix using the cylindrical polar coordinates.
Basis vectors transform the opposite way, as we found in eqn 2.25 which stated that
(3.7) e μ = Λ μ ν e ν = ( x ν x μ ) e ν (3.7) e μ = Λ μ ν e ν = x ν x μ e ν {:(3.7)e_(mu^('))=Lambda_(mu^('))^(nu)e_(nu)=((delx^(nu))/(delx^(mu^('))))e_(nu):}\begin{equation*} \boldsymbol{e}_{\mu^{\prime}}=\Lambda_{\mu^{\prime}}^{\nu} \boldsymbol{e}_{\nu}=\left(\frac{\partial x^{\nu}}{\partial x^{\mu^{\prime}}}\right) \boldsymbol{e}_{\nu} \tag{3.7} \end{equation*}(3.7)eμ=Λμνeν=(xνxμ)eν
and we illustrate the use of this in the following example.
2 2 ^(2){ }^{2}2 Polar coordinates are our first example of curvilinear coordinates, which are sets of coordinates where Pythagoras' theorem doesn't hold simply. That is, s 2 r 2 + θ 2 s 2 r 2 + θ 2 s^(2)!=r^(2)+theta^(2)s^{2} \neq r^{2}+\theta^{2}s2r2+θ2.
(a)

(b)
Fig. 3.1 (a) The point ( x 0 , y 0 ) x 0 , y 0 (x_(0),y_(0))\left(x_{0}, y_{0}\right)(x0,y0) in the Euclidean plane. (b) In polar coordinates, the same point is at ( r , θ ) ( r , θ ) (r,theta)(r, \theta)(r,θ).
Fig. 3.2 (a) The coordinate basis e r e r e_(r)e_{r}er and e θ e θ e_(theta)e_{\theta}eθ has the feature that the basis vectors do not stay a uniform size. In particular, the length of e θ e θ e_(theta)e_{\theta}eθ increases with r r rrr, the distance from the origin. (b) The non-coordinate basis e ^ r e ^ r hat(e)_(r)\hat{\mathbf{e}}_{r}e^r and e ^ θ e ^ θ hat(e)_(theta)\hat{\mathbf{e}}_{\theta}e^θ remains normalized.
3 3 ^(3){ }^{3}3 The word holonomy comes from holo (entire) + nomy (law).
(a)

Example 3.2
Plugging our expressions for polar coordinates into eqn 3.7 gives
(3.8) e r = Λ x r e x + Λ y r e y = ( x r ) θ e x + ( y r ) θ e y = cos θ e x + sin θ e y e θ = Λ x θ e x + Λ θ y e y = ( x θ ) r e x + ( y θ ) r e y = r sin θ e x + r cos θ e y (3.8) e r = Λ x r e x + Λ y r e y = x r θ e x + y r θ e y = cos θ e x + sin θ e y e θ = Λ x θ e x + Λ θ y e y = x θ r e x + y θ r e y = r sin θ e x + r cos θ e y {:[(3.8)e_(r)=Lambda^(x)_(r)e_(x)+Lambda^(y)_(r)e_(y)=((del x)/(del r))_(theta)e_(x)+((del y)/(del r))_(theta)e_(y)=cos thetae_(x)+sin thetae_(y)],[e_(theta)=Lambda^(x)_(theta)e_(x)+Lambda_(theta)^(y)e_(y)=((del x)/(del theta))_(r)e_(x)+((del y)/(del theta))_(r)e_(y)=-r sin thetae_(x)+r cos thetae_(y)]:}\begin{align*} & \boldsymbol{e}_{r}=\Lambda^{x}{ }_{r} \boldsymbol{e}_{x}+\Lambda^{y}{ }_{r} \boldsymbol{e}_{y}=\left(\frac{\partial x}{\partial r}\right)_{\theta} \boldsymbol{e}_{x}+\left(\frac{\partial y}{\partial r}\right)_{\theta} \boldsymbol{e}_{y}=\cos \theta \boldsymbol{e}_{x}+\sin \theta \boldsymbol{e}_{y} \tag{3.8}\\ & \boldsymbol{e}_{\theta}=\Lambda^{x}{ }_{\theta} \boldsymbol{e}_{x}+\Lambda_{\theta}^{y} \boldsymbol{e}_{y}=\left(\frac{\partial x}{\partial \theta}\right)_{r} \boldsymbol{e}_{x}+\left(\frac{\partial y}{\partial \theta}\right)_{r} \boldsymbol{e}_{y}=-r \sin \theta \boldsymbol{e}_{x}+r \cos \theta \boldsymbol{e}_{y} \end{align*}(3.8)er=Λxrex+Λyrey=(xr)θex+(yr)θey=cosθex+sinθeyeθ=Λxθex+Λθyey=(xθ)rex+(yθ)rey=rsinθex+rcosθey
We can also express these results as a matrix equation
(3.9) ( e r e θ ) = ( e x e y ) ( ( x r ) θ ( x θ ) r ( y r ) θ ( y θ ) r ) (3.9) e r e θ = e x e y x r θ x θ r y r θ y θ r {:(3.9)([e_(r),e_(theta)])=([e_(x),e_(y)])([((del x)/(del r))_(theta),((del x)/(del theta))_(r)],[((del y)/(del r))_(theta),((del y)/(del theta))_(r)]):}\left(\begin{array}{ll} \boldsymbol{e}_{r} & \boldsymbol{e}_{\theta} \end{array}\right)=\left(\begin{array}{ll} \boldsymbol{e}_{x} & \boldsymbol{e}_{y} \end{array}\right)\left(\begin{array}{cl} \left(\frac{\partial x}{\partial r}\right)_{\theta} & \left(\frac{\partial x}{\partial \theta}\right)_{r} \tag{3.9}\\ \left(\frac{\partial y}{\partial r}\right)_{\theta} & \left(\frac{\partial y}{\partial \theta}\right)_{r} \end{array}\right)(3.9)(ereθ)=(exey)((xr)θ(xθ)r(yr)θ(yθ)r)
The method we have used hasn't produced the usual basis vectors that we might have expected. Elementary treatments of polar coordinates usually give normalized basis vectors e ^ r e ^ r hat(e)_(r)\hat{\boldsymbol{e}}_{r}e^r and e ^ θ e ^ θ hat(e)_(theta)\hat{\boldsymbol{e}}_{\theta}e^θ given by
e ^ r = cos θ e x + sin θ e y (3.10) e ^ θ = sin θ e x + cos θ e y e ^ r = cos θ e x + sin θ e y (3.10) e ^ θ = sin θ e x + cos θ e y {:[ hat(e)_(r)=cos thetae_(x)+sin thetae_(y)],[(3.10) hat(e)_(theta)=-sin thetae_(x)+cos thetae_(y)]:}\begin{align*} & \hat{\boldsymbol{e}}_{r}=\cos \theta \boldsymbol{e}_{x}+\sin \theta \boldsymbol{e}_{y} \\ & \hat{\boldsymbol{e}}_{\theta}=-\sin \theta \boldsymbol{e}_{x}+\cos \theta \boldsymbol{e}_{y} \tag{3.10} \end{align*}e^r=cosθex+sinθey(3.10)e^θ=sinθex+cosθey
The ones we have found (without the hats) have the disquieting feature that they are not all normalized. In fact
(3.11) e r e r = 1 and e θ e θ = r 2 (3.11) e r e r = 1  and  e θ e θ = r 2 {:(3.11)e_(r)*e_(r)=1quad" and "quade_(theta)*e_(theta)=r^(2):}\begin{equation*} \boldsymbol{e}_{r} \cdot \boldsymbol{e}_{r}=1 \quad \text { and } \quad \boldsymbol{e}_{\theta} \cdot \boldsymbol{e}_{\theta}=r^{2} \tag{3.11} \end{equation*}(3.11)erer=1 and eθeθ=r2
So e r e r e_(r)e_{r}er looks fine, but e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ grows the further out you go (see Fig. 3.2). We will show that this seemingly odd property is not a bug but a feature! It's actually exactly what you need. It's helpful for two reasons:
(1) e r e r e_(r)\boldsymbol{e}_{r}er and e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ were easy to derive. We just had to plug straight into eqn 2.25 ( Λ α μ e μ ) 2.25 Λ α μ e μ 2.25(Lambda_(alpha^('))^(mu)e_(mu))2.25\left(\Lambda_{\alpha^{\prime}}^{\mu} \boldsymbol{e}_{\mu}\right)2.25(Λαμeμ) and out they popped.
(2) More importantly, e r e r e_(r)\boldsymbol{e}_{r}er and e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ form a coordinate basis (also known as a holonomic basis 3 3 ^(3){ }^{3}3 ) whereas e ^ r e ^ r hat(e)_(r)\hat{\boldsymbol{e}}_{r}e^r and e ^ θ e ^ θ hat(e)_(theta)\hat{\boldsymbol{e}}_{\theta}e^θ form a noncoordinate basis (also known as an anholonomic basis). What does that mean? We will return to the notion of coordinate and non-coordinate bases later, but (loosely) the idea is to take a walk around a closed loop in your space and to see if you return to the starting point in the same geometric state as you started. In a coordinate basis, your basis vectors are truly independent and don't depend on each other. This means that they commute (in technical language, the Lie bracket [ e r , e θ ] = e r e θ e θ e r = 0 e r , e θ = e r e θ e θ e r = 0 [e_(r),e_(theta)]=e_(r)e_(theta)-e_(theta)e_(r)=0\left[\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta}\right]=\boldsymbol{e}_{r} \boldsymbol{e}_{\theta}-\boldsymbol{e}_{\theta} \boldsymbol{e}_{r}=0[er,eθ]=ereθeθer=0 ) which means that you can make a closed path by travelling one unit along e r e r e_(r)\boldsymbol{e}_{r}er, one unit along e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ, then minus one unit along e r e r e_(r)\boldsymbol{e}_{r}er and minus one unit along e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ and you will get back to your starting point [see Fig. 3.3(a)]. This doesn't work if you use the normalized vectors (where [ e ^ r , e ^ θ ] 0 e ^ r , e ^ θ 0 [ hat(e)_(r), hat(e)_(theta)]!=0\left[\hat{\boldsymbol{e}}_{r}, \hat{\boldsymbol{e}}_{\theta}\right] \neq 0[e^r,e^θ]0 ) and you don't get back to your starting point [see Fig. 3.3(b)].
Example 3.3
Consider a function f ( x μ ) f x μ f(x^(mu))f\left(x^{\mu}\right)f(xμ) which assigns a number to any spacetime point x μ x μ x^(mu)x^{\mu}xμ. Now consider a path through spacetime x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ) where λ λ lambda\lambdaλ is a number between 0 and 1 . Then f ( x μ ( λ ) ) f x μ ( λ ) f(x^(mu)(lambda))f\left(x^{\mu}(\lambda)\right)f(xμ(λ)) represents a function of that path parameter, giving the value that f f fff takes for every spacetime point along the path. How does f f fff change with λ λ lambda\lambdaλ ? That is given by the derivative of f f fff along the path, written as
(3.12) d f d λ = d x μ d λ f x μ (3.12) d f d λ = d x μ d λ f x μ {:(3.12)(df)/((d)lambda)=(dx^(mu))/(dlambda)*(del f)/(delx^(mu)):}\begin{equation*} \frac{\mathrm{d} f}{\mathrm{~d} \lambda}=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \cdot \frac{\partial f}{\partial x^{\mu}} \tag{3.12} \end{equation*}(3.12)df dλ=dxμdλfxμ
This is true for any function f f fff, and so we could write in general that
(3.13) d d λ = d x μ d λ x μ (3.13) d d λ = d x μ d λ x μ {:(3.13)(d)/((d)lambda)=(dx^(mu))/(dlambda)*(del)/(delx^(mu)):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \lambda}=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \cdot \frac{\partial}{\partial x^{\mu}} \tag{3.13} \end{equation*}(3.13)d dλ=dxμdλxμ
This expression looks a little like that of a vector X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ, with d x μ / d λ d x μ / d λ dx^(mu)//dlambda\mathrm{d} x^{\mu} / \mathrm{d} \lambdadxμ/dλ playing the role of the components of the vector and / x μ / x μ del//delx^(mu)\partial / \partial x^{\mu}/xμ playing the role of the basis vectors. Consequently, we shall identify / x μ / x μ del//delx^(mu)\partial / \partial x^{\mu}/xμ with e μ e μ e_(mu)e_{\mu}eμ, and so in our example of polar coordinates we would write
(3.14) e r = r and e θ = θ (3.14) e r = r  and  e θ = θ {:(3.14)e_(r)=(del)/(del r)quad" and "quade_(theta)=(del)/(del theta):}\begin{equation*} \boldsymbol{e}_{r}=\frac{\partial}{\partial r} \quad \text { and } \quad e_{\theta}=\frac{\partial}{\partial \theta} \tag{3.14} \end{equation*}(3.14)er=r and eθ=θ
Using this trick, it is clear that these basis vectors commute ( [ e r , e θ ] = 0 ) e r , e θ = 0 ([e_(r),e_(theta)]=0)\left(\left[\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta}\right]=0\right)([er,eθ]=0) and serve as a coordinate basis. If we had used the non-coordinate basis
(3.15) e ^ r = r and e ^ θ = 1 r θ , (3.15) e ^ r = r  and  e ^ θ = 1 r θ , {:(3.15) hat(e)_(r)=(del)/(del r)quad" and "quad hat(e)_(theta)=(1)/(r)(del)/(del theta)",":}\begin{equation*} \hat{\boldsymbol{e}}_{r}=\frac{\partial}{\partial r} \quad \text { and } \quad \hat{\boldsymbol{e}}_{\theta}=\frac{1}{r} \frac{\partial}{\partial \theta}, \tag{3.15} \end{equation*}(3.15)e^r=r and e^θ=1rθ,
then we would have found that they do not commute, since
(3.16) [ e ^ r , e ^ θ ] f = r ( 1 r f θ ) 1 r θ f r = 1 r 2 f θ = e ^ θ r f , (3.16) e ^ r , e ^ θ f = r 1 r f θ 1 r θ f r = 1 r 2 f θ = e ^ θ r f , {:(3.16)[ hat(e)_(r), hat(e)_(theta)]f=(del)/(del r)((1)/(r)(del f)/(del theta))-(1)/(r)(del)/(del theta)(del f)/(del r)=-(1)/(r^(2))(del f)/(del theta)=-( hat(e)_(theta))/(r)f",":}\begin{equation*} \left[\hat{\boldsymbol{e}}_{r}, \hat{\boldsymbol{e}}_{\theta}\right] f=\frac{\partial}{\partial r}\left(\frac{1}{r} \frac{\partial f}{\partial \theta}\right)-\frac{1}{r} \frac{\partial}{\partial \theta} \frac{\partial f}{\partial r}=-\frac{1}{r^{2}} \frac{\partial f}{\partial \theta}=-\frac{\hat{e}_{\theta}}{r} f, \tag{3.16} \end{equation*}(3.16)[e^r,e^θ]f=r(1rfθ)1rθfr=1r2fθ=e^θrf,
and so [ e ^ r , e ^ θ ] = 1 r 2 θ = e ^ θ r 0 e ^ r , e ^ θ = 1 r 2 θ = e ^ θ r 0 [ hat(e)_(r), hat(e)_(theta)]=-(1)/(r^(2))(del)/(del theta)=-( hat(e)_(theta))/(r)!=0\left[\hat{\boldsymbol{e}}_{r}, \hat{\boldsymbol{e}}_{\theta}\right]=-\frac{1}{r^{2}} \frac{\partial}{\partial \theta}=-\frac{\hat{e}_{\theta}}{r} \neq 0[e^r,e^θ]=1r2θ=e^θr0.

3.2 Farewell to the position vector

When we first encounter vectors as students, the simplest vector that we usually start with is the position (or displacement) vector x = x μ e μ x = x μ e μ x=x^(mu)e_(mu)\boldsymbol{x}=x^{\mu} \boldsymbol{e}_{\mu}x=xμeμ. In Chapter 2, we found that the position vector does transform appropriately under Lorentz transformations, making it possible to use it in special relativity. However, from this point onwards we will not be using it in general relativity. The reason is that it does not, in general, transform according to our rule for coordinate transformations 4 4 ^(4){ }^{4}4
(3.17) X μ = x μ x ν X ν (3.17) X μ = x μ x ν X ν {:(3.17)X^(mu^('))=(delx^(mu^(')))/(delx^(nu))X^(nu):}\begin{equation*} X^{\mu^{\prime}}=\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}} X^{\nu} \tag{3.17} \end{equation*}(3.17)Xμ=xμxνXν
The problem is that coordinates are related by a transformation of the form
(3.18) x μ = A μ ν x ν , (3.18) x μ = A μ ν x ν , {:(3.18)x^(mu^('))=A^(mu^('))_(nu)x^(nu)",":}\begin{equation*} x^{\mu^{\prime}}=A^{\mu^{\prime}}{ }_{\nu} x^{\nu}, \tag{3.18} \end{equation*}(3.18)xμ=Aμνxν,
and in curved spacetime the coefficients A μ ν A μ ν A^(mu^('))_(nu)A^{\mu^{\prime}}{ }_{\nu}Aμν will depend on the coordinates x ν x ν x^(nu)x^{\nu}xν. It is only in the special case 5 5 ^(5){ }^{5}5 that A μ ν A μ ν A^(mu^('))_(nu)A^{\mu^{\prime}}{ }_{\nu}Aμν is independent of x ν x ν x^(nu)x^{\nu}xν that we can write that x μ / x ν = A μ ν x μ / x ν = A μ ν delx^(mu^('))//delx^(nu)=A^(mu^('))_(nu)\partial x^{\mu^{\prime}} / \partial x^{\nu}=A^{\mu^{\prime}}{ }_{\nu}xμ/xν=Aμν and eqn 3.17 will then
\curvearrowright This relationship between vectors and derivatives is explored in detail in Chapter 31. We return to non-coordinate bases in Chapter 10.
4 4 ^(4){ }^{4}4 We can see this in Example 3.1, where the matrix in eqn 3.6 does not allow us to transform between components x μ = ( r , θ ) x μ = ( r , θ ) x^(mu^('))=(r,theta)x^{\mu^{\prime}}=(r, \theta)xμ=(r,θ) and x α = ( x , y ) x α = ( x , y ) x^(alpha)=(x,y)x^{\alpha}=(x, y)xα=(x,y),
5 5 ^(5){ }^{5}5 In flat spacetime, as assumed in special relativity, this condition holds and the displacement vector then presents no problem.
6 6 ^(6){ }^{6}6 A good slogan to bear in mind is that 'coordinates are not vectors'.
Fig. 3.4 A circle of radius r r rrr on the surface of a sphere of radius R R RRR (in a galaxy far, far away).
Fig. 3.5 A spherical triangle is constructed by three great circles. The sum of the internal angles, α 1 + α 2 + α 3 α 1 + α 2 + α 3 alpha_(1)+alpha_(2)+alpha_(3)\alpha_{1}+\alpha_{2}+\alpha_{3}α1+α2+α3, is greater than π π pi\piπ.
7 7 ^(7){ }^{7}7 Girard's theorem was originally written down by Thomas Harriot (15601621), who was also the first person to make a drawing of the moon through a telescope, several months before Galileo, and worked out Snell's law of refraction nearly two decades before Snell, though six centuries after Ibn Sahl. Credit for first discovery is not always apportioned fairly!
hold for X μ = x μ X μ = x μ X^(mu)=x^(mu)X^{\mu}=x^{\mu}Xμ=xμ. This will also work in the case of linear, homogeneous transformations, such as the Lorentz transformations or spatial rotations. Another way of seeing the same thing is to notice that in a curved spacetime a displacement vector is not very well defined; it may not even live in that space. For example, if we consider only the space describing the Earth's surface, then a displacement vector from New York to Tokyo will be an arrow that ploughs through the interior of the Earth. What is well defined though is a path from New York to Tokyo made up of lots of infinitesimal displacements which all can lie on the Earth's surface.
As a result, we shall now drop the position vector x x x\boldsymbol{x}x from our list of well-behaved vectors that transform appropriately, since the transformation we want to use is the one described by eqn 3.17 . However, we will make still use of the coordinates x μ x μ x^(mu)x^{\mu}xμ describing particular events in spacetime. 6 6 ^(6){ }^{6}6 We might now be concerned that not having a displacement vector prevents us from defining a velocity vector, which was previously the derivative of x x x\boldsymbol{x}x with respect to the proper time τ τ tau\tauτ. As we'll see in Chapter 7, this concern is unfounded as the velocity vector can be constructed geometrically from the tangent to the world line of a particle. In any case, as we have been explaining, the vector corresponding to an infinitesimal displacement in spacetime does transform correctly.

3.3 Non-Euclidean space

Non-Euclidean space is any space which is not Euclidean, i.e. not equipped with the Euclidean metric with components δ μ ν δ μ ν delta_(mu nu)\delta_{\mu \nu}δμν (so that d s 2 = d s 2 = ds^(2)=\mathrm{d} s^{2}=ds2= δ μ ν d x μ d x ν ) δ μ ν d x μ d x ν {:delta_(mu nu)dx^(mu)dx^(nu))\left.\delta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right)δμνdxμdxν). An example of a non-Euclidean space is the Minkowski spacetime of special relativity in which
(3.19) d s 2 = η μ ν d x μ d x ν = d t 2 + d x 2 + d y 2 + d z 2 (3.19) d s 2 = η μ ν d x μ d x ν = d t 2 + d x 2 + d y 2 + d z 2 {:(3.19)ds^(2)=eta_(mu nu)dx^(mu)dx^(nu)=-dt^(2)+dx^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}=-\mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{3.19} \end{equation*}(3.19)ds2=ημνdxμdxν=dt2+dx2+dy2+dz2
Minkowski space is known as a (3+1)-dimensional space (meaning events are described by three spatial coordinates and one time coordinate).
One of the consequences of a non-Euclidean space is that some of Euclid's famous results don't always hold.

Example 3.4

In two-dimensional Euclidean space, the circumference C C CCC of a circle of radius r r rrr is given by C = 2 π r C = 2 π r C=2pi rC=2 \pi rC=2πr and the internal angles in a triangle add up to 180 180 180^(@)180^{\circ}180 (or π π pi\piπ radians). However, these results don't work on the surface of a sphere. The circumference of a circle of radius r r rrr on the surface of a sphere of radius R R RRR (see Fig. 3.4) is given by
(3.20) C = 2 π r sinc r R (3.20) C = 2 π r sinc r R {:(3.20)C=2pi r sinc(r)/(R):}\begin{equation*} C=2 \pi r \operatorname{sinc} \frac{r}{R} \tag{3.20} \end{equation*}(3.20)C=2πrsincrR
where sinc x = ( sin x ) / x sinc x = ( sin x ) / x sinc x=(sin x)//x\operatorname{sinc} x=(\sin x) / xsincx=(sinx)/x, so C 2 π r C 2 π r C rarr2pi rC \rightarrow 2 \pi rC2πr when r R r R r≪Rr \ll RrR. Moreover, from Girard's theorem, 7 7 ^(7){ }^{7}7 the sum of the internal angles α 1 α 1 sumalpha_(1)\sum \alpha_{1}α1 of a spherical triangle on the surface of a sphere (see Fig. 3.5) is given by
(3.21) α i = π + A R 2 (3.21) α i = π + A R 2 {:(3.21)sumalpha_(i)=pi+(A)/(R^(2)):}\begin{equation*} \sum \alpha_{i}=\pi+\frac{A}{R^{2}} \tag{3.21} \end{equation*}(3.21)αi=π+AR2
where A A AAA is the area of the triangle. Thus, the sum of the angles is greater than π π pi\piπ, although if A R 2 A R 2 A≪R^(2)A \ll R^{2}AR2 Euclid's result is good enough. These two results are proved in Exercises 3.3 and 3.4.

Chapter summary

  • Euclidean space uses a metric δ μ ν δ μ ν delta_(mu nu)\delta_{\mu \nu}δμν and gives us the familiar results from Euclidean geometry. It can be described using a Cartesian coordinate system ( x , y , z x , y , z x,y,zx, y, zx,y,z, etc.), but also by other coordinate systems (e.g. plane polar coordinates in two dimensions).
  • The basis vectors for another coordinate basis can be derived using a transformation from those from another coordinate basis (such as from Cartesian coordinates). These basis vectors are independent from one another so that they commute (their Lie bracket is zero). A non-coordinate basis does not have this property.
  • A non-Euclidean space has a non-Euclidean metric, but it can still be flat (i.e. not curved), and an example is the Minkowski space with metric η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν.

Exercises

(3.1) Show that for polar coordinates in two dimensions
(3.22) e x = x r e r y r 2 e θ , (3.22) e x = x r e r y r 2 e θ , {:(3.22)e_(x)=(x)/(r)e_(r)-(y)/(r^(2))e_(theta)",":}\begin{equation*} \boldsymbol{e}_{x}=\frac{x}{r} \boldsymbol{e}_{r}-\frac{y}{r^{2}} \boldsymbol{e}_{\theta}, \tag{3.22} \end{equation*}(3.22)ex=xreryr2eθ,
and
(3.23) e y = y r e r + x r 2 e θ (3.23) e y = y r e r + x r 2 e θ {:(3.23)e_(y)=(y)/(r)e_(r)+(x)/(r^(2))e_(theta):}\begin{equation*} \boldsymbol{e}_{y}=\frac{y}{r} \boldsymbol{e}_{r}+\frac{x}{r^{2}} \boldsymbol{e}_{\theta} \tag{3.23} \end{equation*}(3.23)ey=yrer+xr2eθ
(3.2) For plane polar coordinates, show that
(3.24) e r θ = e θ r , e θ θ = r e r e e r = 0 , e θ r = e θ r (3.24) e r θ = e θ r , e θ θ = r e r e e r = 0 , e θ r = e θ r {:(3.24){:[(dele_(r))/(del theta)=(e_(theta))/(r)",",(dele_(theta))/(del theta)=-re_(r)],[(dele_(e))/(del r)=0",",(dele_(theta))/(del r)=(e_(theta))/(r)]:}:}\begin{array}{ll} \frac{\partial \boldsymbol{e}_{r}}{\partial \theta}=\frac{\boldsymbol{e}_{\theta}}{r}, & \frac{\partial \boldsymbol{e}_{\theta}}{\partial \theta}=-r \boldsymbol{e}_{r} \tag{3.24}\\ \frac{\partial e_{e}}{\partial r}=0, & \frac{\partial e_{\theta}}{\partial r}=\frac{e_{\theta}}{r} \end{array}(3.24)erθ=eθr,eθθ=rereer=0,eθr=eθr
(3.3) Using simple geometry, prove eqn 3.20. By defining the curvature K = 1 / R 2 K = 1 / R 2 K=1//R^(2)K=1 / R^{2}K=1/R2 for the sphere, eqn 3.20 becomes C = 2 π r sinc ( r K ) C = 2 π r sinc ( r K ) C=2pi r sinc(rsqrtK)C=2 \pi r \operatorname{sinc}(r \sqrt{K})C=2πrsinc(rK). Hence, show that
(3.25) K = lim r 0 3 π r 3 ( 2 π r C ) , (3.25) K = lim r 0 3 π r 3 ( 2 π r C ) , {:(3.25)K=lim_(r rarr0)(3)/(pir^(3))(2pi r-C)",":}\begin{equation*} K=\lim _{r \rightarrow 0} \frac{3}{\pi r^{3}}(2 \pi r-C), \tag{3.25} \end{equation*}(3.25)K=limr03πr3(2πrC),
and hence the curvature of a sphere can be calculated by comparing the circumference to 2 π 2 π 2pi2 \pi2π times the radius for circles of ever-decreasing size.
(3.4) To prove Girard's theorem (i.e. to prove eqn 3.21), Fig. 3.6 may be helpful. Three great circles produce a spherical triangle of area A A AAA but they also produce another circular triangle on the other side of the sphere. Without loss of generality, you can take the radius of the sphere to be unity, so the total surface area is then 4 π 4 π 4pi4 \pi4π. With two spherical triangles, the remaining area is then 4 π 2 A 4 π 2 A 4pi-2A4 \pi-2 A4π2A. That remaining area is made up of strips like the two shown shaded in Fig. 3.6. You should be able to argue that each of those strips has area 2 α 1 A 2 α 1 A 2alpha_(1)-A2 \alpha_{1}-A2α1A. Putting that together, you should then be able to deduce that α 1 + α 2 + α 3 = π + A α 1 + α 2 + α 3 = π + A alpha_(1)+alpha_(2)+alpha_(3)=pi+A\alpha_{1}+\alpha_{2}+\alpha_{3}=\pi+Aα1+α2+α3=π+A and hence prove the theorem.
Fig. 3.6 Construction for the proof of Girard's theorem.
(3.5) A transformation to a flat, uniformly rotating frame can be achieved via the transformation
t = t x = x cos Ω t y sin Ω t y = x sin Ω t + y cos Ω t (3.26) z = z t = t x = x cos Ω t y sin Ω t y = x sin Ω t + y cos Ω t (3.26) z = z {:[t=t^(')],[x=x^(')cos Omegat^(')-y^(')sin Omegat^(')],[y=x^(')sin Omegat^(')+y^(')cos Omegat^(')],[(3.26)z=z^(')]:}\begin{align*} t & =t^{\prime} \\ x & =x^{\prime} \cos \Omega t^{\prime}-y^{\prime} \sin \Omega t^{\prime} \\ y & =x^{\prime} \sin \Omega t^{\prime}+y^{\prime} \cos \Omega t^{\prime} \\ z & =z^{\prime} \tag{3.26} \end{align*}t=tx=xcosΩtysinΩty=xsinΩt+ycosΩt(3.26)z=z
where Ω Ω Omega\OmegaΩ is the angular speed of the rotation. What form does the Minkowski metric line element d s 2 = d s 2 = ds^(2)=\mathrm{d} s^{2}=ds2= d x d x d x d x dx*dx\mathrm{d} \boldsymbol{x} \cdot \mathrm{d} \boldsymbol{x}dxdx take in this rotating frame?

Linear slot machines

Thou, silent form, dost tease us out of thought As doth eternity: Cold Pastoral!
John Keats (1795-1821) Ode on a Grecian Urn (1820)
A vector can be thought of as an arrow in spacetime, but when spacetime is curved some odd things start to happen. If you travel due North from one city to another over a curved surface (see Fig. 4.1) then you might think you are following a vector in the space of that curved surface. However, following the vector takes you out of the curved surface and leaves you hovering in mid-air, suspended over your final destination! This simple example demonstrates the fact that the vectors defined at a point in a curved space don't necessarily live in that space. In fact, the vectors defined at a point in a particular space live in what is called the tangent space. For the example of the Earth's surface, the space is the sphere ( S 2 S 2 S^(2)S^{2}S2, in the language used by mathematicians) and the tangent space is the two-dimensional (flat) plane ( R 2 R 2 (R^(2):}\left(\mathbb{R}^{2}\right.(R2, in the language used by mathematicians). Thus, when travelling between two cities on a curved space, the journey is best thought of as a path through the space, not a vector between the end points. Vectors are really things that tell you about the local behaviour at a point (because they exist only in the tangent space [see Fig. 4.2]). For now, it is enough to remember that vectors like X X X\boldsymbol{X}X are independent of coordinates, but can be described in a particular coordinate system using basis vectors e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ and components X μ X μ X^(mu)X^{\mu}Xμ in an expression X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ. A vector has a direction and a magnitude, or length. 1 1 ^(1){ }^{1}1
Vectors are only one of the sorts of objects that we require to produce a geometrical description of Nature. In this chapter, we introduce another object that, in many ways, complements the notion of a vector. It has a rather odd name which comes about because the subject of differential geometry [pioneered by the French mathematician Élie Cartan (1869 1951)] contains the notion of what are called 'differential forms'. These can be of increasing 'degree' p p ppp and are then called p p ppp-forms. Here we only want to consider the simplest such object ( p = 1 ) ( p = 1 ) (p=1)(p=1)(p=1) which is called a 1-form. Like a vector, a 1 -form σ ~ σ ~ tilde(sigma)\tilde{\boldsymbol{\sigma}}σ~ exists independently of coordinates. It can be expressed in a particular coordinate system via its components and a set of basis 1-forms ω μ ω μ omega^(mu)\boldsymbol{\omega}^{\mu}ωμ, in an expression σ ~ = σ μ ω μ σ ~ = σ μ ω μ tilde(sigma)=sigma_(mu)omega^(mu)\tilde{\boldsymbol{\sigma}}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}σ~=σμωμ. Notice how the positions of the indices in the components and basis are reversed compared to vectors. Notice also our notation: X X X\boldsymbol{X}X is a vector, σ ~ σ ~ tilde(sigma)\tilde{\boldsymbol{\sigma}}σ~ is a 1 -form. The tilde (the wiggly line above the symbol) signifies the 1-form.

4.1 Dot products and down vectors 44 4.2 Vectors and 1-forms 46 4.3 Transformations 49 4.4 Tensors 50 4.5 Energy-momentum tensor 52 Chapter summary 55
Exercises 55
Fig. 4.1 Oxford and Durham are two cities in the UK, with Durham 337 km (only about 200 miles) due North of Oxford. Travelling due North from Oxford on a straight line leaves you in mid-air, suspended about 9 km above Durham, due to the curvature of the Earth. (Travelling due South from Durham would have the same effect when arriving at Oxford, so the sense of superiority felt by the inhabitants of each city would be the same!) The diagram exaggerates the curvature of the Earth for clarity
1 1 ^(1){ }^{1}1 The length of a vector is something about which all observers agree and gives rise to the notion of an invariant equal to X 2 X 2 X^(2)\boldsymbol{X}^{2}X2.
Fig. 4.2 A vector X X X\boldsymbol{X}X lives in a special space called the tangent space. Points in spacetime can be described by what is called a manifold M M MMM. In general, this will be curved and therefore a vector cannot live in it, but only in a space which is tangent to it. For some point p p ppp in the manifold, there will be a tangent space which (in the notation of differential geometry which we generally avoid in this book) is denoted by T , M T , M T,MT, MT,M (which in this book) is denoted by T p M T p M T_(p)MT_{p} MTpM (which you can read as 'the tangent space at point p p ppp of the manifold M M M^(')M^{\prime}M ).
Fig. 4.3 A 1-form can be described as a set of planes. The inner product between a vector X X X\boldsymbol{X}X and the 1-form can then be thought of as the number of planes skewered by the vector.
2 2 ^(2){ }^{2}2 We saw these in the definition of the 1-form Y ~ = Y μ ω μ Y ~ = Y μ ω μ tilde(Y)=Y_(mu)omega^(mu)\tilde{\boldsymbol{Y}}=Y_{\mu} \boldsymbol{\omega}^{\mu}Y~=Yμωμ in the last section. The link to 1 -forms will be made shortly.
Why do we need this additional object? The reason is that when we combine vectors and 1 -forms which, as we discuss in this chapter, involves forming an inner product, we have to produce a number (i.e. a scalar), and numbers are invariant with respect to coordinate transformations. Thus, if the components of vectors transform in one particular way due to a change of coordinates then we need the components of the object that they combine with to transform in the opposite way so that the result of their combination is independent of the coordinate transformation. This idea might be already familiar as it appears in quantum mechanics; a vector can be represented by a ket | ψ | ψ |psi:)|\psi\rangle|ψ and it combines with a bra ϕ | ϕ | (:phi|\langle\phi|ϕ| to make a number ϕ ψ ϕ ψ (:phi∣psi:)\langle\phi \mid \psi\rangleϕψ. The kets and the bras live in different spaces. Mathematicians think of vectors living in a vector space and the 1 -forms live in the dual space to that vector space. Thus, the 1-forms can be thought of as objects that map vectors onto real numbers. (In quantum mechanics, bras live in a dual space to the ket space and can be thought of as objects that map kets onto complex numbers.)
If a vector can be thought of as an arrow, what geometric object does a 1 -form resemble? One answer is a set of equally spaced plane surfaces, as shown in Fig. 4.3. The magnitude of a 1 -form corresponds to the spatial frequency of the planes (that is, the reciprocal of the distance between planes). The direction of the 1-form tells us how the planes are arranged. This is most easily seen via the basis 1 -forms, which are planes arranged perpendicular to the axes of the coordinate system.
In order to understand objects like vectors and 1 -forms, we shall examine the various ways that they can be combined to make numbers. What unites the methods of combining these objects is that they can be represented as machines that generate scalars. We call the machines tensors or, more colourfully, linear slot machines, since the notation we employ features slots in which to insert vectors and 1-forms (and, as we shall see, the operations are linear ones, see Section 4.3). We start by returning to a familiar way of combining two vectors to make a number: the dot product.

4.1 Dot products and down vectors

In Chapter 2, we wrote the dot product as the component equation
(4.1) X Y = η μ ν X μ Y ν (4.1) X Y = η μ ν X μ Y ν {:(4.1)X*Y=eta_(mu nu)X^(mu)Y^(nu):}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu} \tag{4.1} \end{equation*}(4.1)XY=ημνXμYν
where η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν are the components of the Minkowski metric and we sum over repeated indices. The result of evaluating a dot product using eqn 4.1 is a scalar, which is to say that the result is the same, no matter which coordinate system we consider. Let's consider some new ways of writing the dot product. We can simplify eqn 4.1 by absorbing the η μ ν Y ν η μ ν Y ν eta_(mu nu)Y^(nu)\eta_{\mu \nu} Y^{\nu}ημνYν part into a new object which has components with indices in the down position, 2 2 ^(2){ }^{2}2 and we shall call these components Y μ Y μ Y_(mu)Y_{\mu}Yμ, so that
(4.2) Y μ = η μ ν Y ν (4.2) Y μ = η μ ν Y ν {:(4.2)Y_(mu)=eta_(mu nu)Y^(nu):}\begin{equation*} Y_{\mu}=\eta_{\mu \nu} Y^{\nu} \tag{4.2} \end{equation*}(4.2)Yμ=ημνYν
The dot product is now an expression in which we have to sum over one up-index and one down-index. That is to say
(4.3) X Y = X μ Y μ (4.3) X Y = X μ Y μ {:(4.3)X*Y=X^(mu)Y_(mu):}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{Y}=X^{\mu} Y_{\mu} \tag{4.3} \end{equation*}(4.3)XY=XμYμ
Another way of looking at eqn 4.2 is that the components of the metric take the index in the up position and replace it with the index in the down position. We say that the metric η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν lowers the index.
Example 4.1
Let's take a dot product of a basis vector e λ e λ e_(lambda)\boldsymbol{e}_{\lambda}eλ with a vector X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ. We have
X e λ = X μ e μ e λ X e λ = X μ e μ e λ X*e_(lambda)=X^(mu)e_(mu)*e_(lambda)\boldsymbol{X} \cdot \boldsymbol{e}_{\lambda}=X^{\mu} e_{\mu} \cdot e_{\lambda}Xeλ=Xμeμeλ
(4.4) = X μ e μ e λ = X μ η μ λ = X λ (using eqn 2.15: e μ e λ = η μ λ ) (lowering an index). (4.4) = X μ e μ e λ = X μ η μ λ = X λ  (using eqn 2.15:  e μ e λ = η μ λ  )   (lowering an index).  {:[(4.4)=X^(mu)e_(mu)*e_(lambda)],[=X^(mu)eta_(mu lambda)],[=X_(lambda)quad" (using eqn 2.15: "e_(mu)*e_(lambda)=eta_(mu lambda)" ) "],[" (lowering an index). "]:}\begin{align*} & =X^{\mu} \boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\lambda} \tag{4.4}\\ & =X^{\mu} \eta_{\mu \lambda} \\ & =X_{\lambda} \quad \text { (using eqn 2.15: } \boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\lambda}=\eta_{\mu \lambda} \text { ) } \\ & \text { (lowering an index). } \end{align*}(4.4)=Xμeμeλ=Xμημλ=Xλ (using eqn 2.15: eμeλ=ημλ )  (lowering an index). 
(4.5) X 2 = X μ η μ 2 = X 0 η 02 + X 1 η 12 + X 2 η 22 + X 3 η 32 (4.5) X 2 = X μ η μ 2 = X 0 η 02 + X 1 η 12 + X 2 η 22 + X 3 η 32 {:(4.5)X_(2)=X^(mu)eta_(mu2)=X^(0)eta_(02)+X^(1)eta_(12)+X^(2)eta_(22)+X^(3)eta_(32):}\begin{equation*} X_{2}=X^{\mu} \eta_{\mu 2}=X^{0} \eta_{02}+X^{1} \eta_{12}+X^{2} \eta_{22}+X^{3} \eta_{32} \tag{4.5} \end{equation*}(4.5)X2=Xμημ2=X0η02+X1η12+X2η22+X3η32
In the usual Minkowski space, where η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν is diagonal, 3 3 ^(3){ }^{3}3 we have
(4.6) X 2 = X 2 η 22 = X 2 (4.6) X 2 = X 2 η 22 = X 2 {:(4.6)X_(2)=X^(2)eta_(22)=X^(2):}\begin{equation*} X_{2}=X^{2} \eta_{22}=X^{2} \tag{4.6} \end{equation*}(4.6)X2=X2η22=X2
This is true for all of the spatial components (i.e. we have X 1 = X 1 X 1 = X 1 X_(1)=X^(1)X_{1}=X^{1}X1=X1 and X 3 = X 3 X 3 = X 3 X_(3)=X^(3)X_{3}=X^{3}X3=X3 ), but we also have that X 0 = X 0 X 0 = X 0 X_(0)=-X^(0)X_{0}=-X^{0}X0=X0, since η 00 = 1 η 00 = 1 eta_(00)=-1\eta_{00}=-1η00=1.
We define the inverse of η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν as η μ ν η μ ν eta^(mu nu)\eta^{\mu \nu}ημν, which is to say 4 4 ^(4){ }^{4}4
(4.7) η μ ν η ν λ = δ μ λ (4.7) η μ ν η ν λ = δ μ λ {:(4.7)eta_(mu nu)eta^(nu lambda)=delta_(mu)^(lambda):}\begin{equation*} \eta_{\mu \nu} \eta^{\nu \lambda}=\delta_{\mu}^{\lambda} \tag{4.7} \end{equation*}(4.7)ημνηνλ=δμλ
By the same logic that led to eqn 4.2 , we then also have the action of raising an index
(4.8) X ν = η μ ν X μ (4.8) X ν = η μ ν X μ {:(4.8)X^(nu)=eta^(mu nu)X_(mu):}\begin{equation*} X^{\nu}=\eta^{\mu \nu} X_{\mu} \tag{4.8} \end{equation*}(4.8)Xν=ημνXμ
This allows us to write the dot product in terms of down-vector components as
(4.9) X Y = η μ ν X μ Y ν (4.9) X Y = η μ ν X μ Y ν {:(4.9)X*Y=eta^(mu nu)X_(mu)Y_(nu):}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{Y}=\eta^{\mu \nu} X_{\mu} Y_{\nu} \tag{4.9} \end{equation*}(4.9)XY=ημνXμYν
Example 4.2
A simple way to understand the geometry of up and down components is to consider Fig. 4.4, showing a vector X X X\boldsymbol{X}X expressed in a coordinate system in which the basis vectors are not orthogonal. As usual, the vector is written as
(4.10) X = X 1 e 1 + X 2 e 2 (4.10) X = X 1 e 1 + X 2 e 2 {:(4.10)X=X^(1)e_(1)+X^(2)e_(2):}\begin{equation*} \boldsymbol{X}=X^{1} e_{1}+X^{2} e_{2} \tag{4.10} \end{equation*}(4.10)X=X1e1+X2e2
From the figure, we see that vectors X 1 e 1 X 1 e 1 X^(1)e_(1)X^{1} e_{1}X1e1 and X 2 e 2 X 2 e 2 X^(2)e_(2)X^{2} e_{2}X2e2 form the usual parallelogram describing the addition of two vectors, with sides of length X 1 X 1 X^(1)X^{1}X1 and X 2 X 2 X^(2)X^{2}X2. We saw from the previous example that X 1 X 1 X_(1)X_{1}X1 is simply the projection of X X X\boldsymbol{X}X along e 1 e 1 e_(1)\boldsymbol{e}_{1}e1, achieved using the dot product X e 1 X e 1 X*e_(1)\boldsymbol{X} \cdot \boldsymbol{e}_{1}Xe1. So we have
(4.11) X j = X e j (4.11) X j = X e j {:(4.11)X_(j)=X*e_(j):}\begin{equation*} X_{j}=\boldsymbol{X} \cdot \boldsymbol{e}_{j} \tag{4.11} \end{equation*}(4.11)Xj=Xej
with j = 1 , 2 j = 1 , 2 j=1,2j=1,2j=1,2 (in the lowered position), as shown in the figure.
The metric has components η μ ν = e μ e ν η μ ν = e μ e ν eta_(mu nu)=e_(mu)*e_(nu)\eta_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}ημν=eμeν, which in this coordinate system is not diagonal. We now use the metric to raise an index, and we obtain
(4.12) X i = η i j ( X e j ) (4.12) X i = η i j X e j {:(4.12)X^(i)=eta^(ij)(X*e_(j)):}\begin{equation*} X^{i}=\eta^{i j}\left(\boldsymbol{X} \cdot \boldsymbol{e}_{j}\right) \tag{4.12} \end{equation*}(4.12)Xi=ηij(Xej)
So for X 1 X 1 X^(1)X^{1}X1 we have
(4.13) X 1 = η 11 X 1 + η 12 X 2 (4.13) X 1 = η 11 X 1 + η 12 X 2 {:(4.13)X^(1)=eta^(11)X_(1)+eta^(12)X_(2):}\begin{equation*} X^{1}=\eta^{11} X_{1}+\eta^{12} X_{2} \tag{4.13} \end{equation*}(4.13)X1=η11X1+η12X2
3 3 ^(3){ }^{3}3 Recall eqn 2.15 for the components of the Minkowski metric
η μ ν = ( 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ) η μ ν = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 eta_(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1])\eta_{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)ημν=(1000010000100001)
which means that η 00 = 1 , η 11 = 1 η 00 = 1 , η 11 = 1 eta_(00)=-1,eta_(11)=1\eta_{00}=-1, \eta_{11}=1η00=1,η11=1, η 22 = 1 , η 33 = 1 η 22 = 1 , η 33 = 1 eta_(22)=1,eta_(33)=1\eta_{22}=1, \eta_{33}=1η22=1,η33=1, and all other elements are zero.
4 4 ^(4){ }^{4}4 This means that
η μ ν = ( 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ) η μ ν = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 eta^(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1])\eta^{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)ημν=(1000010000100001)
and hence η 00 = 1 , η 11 = 1 , η 22 = 1 η 00 = 1 , η 11 = 1 , η 22 = 1 eta^(00)=-1,eta^(11)=1,eta^(22)=1\eta^{00}=-1, \eta^{11}=1, \eta^{22}=1η00=1,η11=1,η22=1, η 33 = 1 η 33 = 1 eta^(33)=1\eta^{33}=1η33=1, and all other elements are zero. Thus, η μ ν η μ ν eta^(mu nu)\eta^{\mu \nu}ημν and η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν act like the same matrix. This property will not hold for most second-rank tensors (i.e. for objects with two indices, to be defined later in this chapter).
Fig. 4.4 The geometry of the up and down components.
5 5 ^(5){ }^{5}5 Despite the route we have taken, it is not the case that 1 -forms owe their existence to the metric or to vectors. In fact, they can exist more generally in a system where a metric is not defined. Although we shall not abandon the metric until later in the book, we turn to the more general properties of 1 -forms in the next section.
Fig. 4.5 The metric tensor can be thought of as a kind of 'slot machine', written as η ( η ( eta(\boldsymbol{\eta}(η(, ) in mathematical symbols, but here is a mental picture of this object. The machine has two slots into which you have to insert vectors. Once you have inserted them, then turn the handle (meaning evaluate eqn 4.17), and out pops a number which is the output of the machine.
6 6 ^(6){ }^{6}6 This corresponds to the procedure of summing over all up and down components in expressions like η μ ν X μ u ν η μ ν X μ u ν eta_(mu nu)X^(mu)u^(nu)\eta_{\mu \nu} X^{\mu} u^{\nu}ημνXμuν.
The inner product is a linear object, which is to say that, if a a aaa and b b bbb are constants, we have
a σ ~ , b X = a b σ ~ , X , a σ ~ , b X = a b σ ~ , X , (:a tilde(sigma),bX:)=ab(: tilde(sigma),X:),\langle a \tilde{\boldsymbol{\sigma}}, b \boldsymbol{X}\rangle=a b\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle,aσ~,bX=abσ~,X,
and also, if Y Y Y\boldsymbol{Y}Y is another vector and ζ ~ ζ ~ tilde(zeta)\tilde{\boldsymbol{\zeta}}ζ~ another 1-form, that
σ ~ , ( X + Y ) = σ ~ , X + σ ~ , Y σ ~ , ( X + Y ) = σ ~ , X + σ ~ , Y (: tilde(sigma),(X+Y):)=(: tilde(sigma),X:)+(: tilde(sigma),Y:)\langle\tilde{\boldsymbol{\sigma}},(\boldsymbol{X}+\boldsymbol{Y})\rangle=\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle+\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{Y}\rangleσ~,(X+Y)=σ~,X+σ~,Y,
( σ ~ + ζ ~ ) , X = σ ~ , X + ζ ~ , X ( σ ~ + ζ ~ ) , X = σ ~ , X + ζ ~ , X (:( tilde(sigma)+ tilde(zeta)),X:)=(: tilde(sigma),X:)+(: tilde(zeta),X:)\langle(\tilde{\boldsymbol{\sigma}}+\tilde{\boldsymbol{\zeta}}), \boldsymbol{X}\rangle=\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle+\langle\tilde{\boldsymbol{\zeta}}, \boldsymbol{X}\rangle(σ~+ζ~),X=σ~,X+ζ~,X.
7 7 ^(7){ }^{7}7 Note that we are writing that X ( σ ~ ) = X ( σ ~ ) = X( tilde(sigma))=\boldsymbol{X}(\tilde{\boldsymbol{\sigma}})=X(σ~)= σ ~ ( X ) σ ~ ( X ) tilde(sigma)(X)\tilde{\boldsymbol{\sigma}}(\boldsymbol{X})σ~(X), namely that a vector operating on a 1 -form gives the same result as a 1 form operating on a vector. We needn't do that (the mathematics doesn't insist upon it), and for example in quantum mechanics the analogue doesn't hold: σ X σ X (:sigma∣X:)\langle\sigma \mid X\rangleσX is the complex conjugate of X σ X σ (:X∣sigma:)\langle X \mid \sigma\rangleXσ, and the two are only equal if σ X σ X (:sigma∣X:)\langle\sigma \mid X\rangleσX is real. In general relativity, the quantities we use are real and we will al ways be able to assume that these give the same result.
The existence of down components implies that, just as we have vectors built from up components and basis vectors, there exist objects whose components are the down components. These objects are the 1 -forms and are written as
(4.14) X ~ = X μ ω μ , (4.14) X ~ = X μ ω μ , {:(4.14) tilde(X)=X_(mu)omega^(mu)",":}\begin{equation*} \tilde{\boldsymbol{X}}=X_{\mu} \boldsymbol{\omega}^{\mu}, \tag{4.14} \end{equation*}(4.14)X~=Xμωμ,
where the basis is made up 5 5 ^(5){ }^{5}5 from basis 1 -forms ω μ ω μ omega^(mu)\boldsymbol{\omega}^{\mu}ωμ.

4.2 Vectors and 1-forms

Previously, the dot product of two vectors was built using the metric components via
(4.15) X Y = η μ ν X μ Y ν = X μ Y μ . (4.15) X Y = η μ ν X μ Y ν = X μ Y μ . {:(4.15)X*Y=eta_(mu nu)X^(mu)Y^(nu)=X^(mu)Y_(mu).:}\begin{equation*} \boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu}=X^{\mu} Y_{\mu} . \tag{4.15} \end{equation*}(4.15)XY=ημνXμYν=XμYμ.
We can think of the metric in a different way. We take the metric to be the slot machine η ( η ( eta(\boldsymbol{\eta}(η(, ) . T h i s m a c h i n e h a s t w o s l o t s i n t o w h i c h w e ) . T h i s m a c h i n e h a s t w o s l o t s i n t o w h i c h w e ).Thismachinehastwoslotsintowhichwe) . This machine has two slots into which we).Thismachinehastwoslotsintowhichwe can input vectors (see Fig. 4.5). The machine outputs a scalar, which is the dot product of the two vectors we have inserted. So take the metric η ( η ( eta(\boldsymbol{\eta}(η(, ) a n d f i l l i n t h e s l o t s w i t h v e c t o r s X ) a n d f i l l i n t h e s l o t s w i t h v e c t o r s X )andfillintheslotswithvectorsX) and fill in the slots with vectors \boldsymbol{X})andfillintheslotswithvectorsX and Y Y Y\boldsymbol{Y}Y to obtain η ( X , Y ) η ( X , Y ) eta(X,Y)\boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Y})η(X,Y). This can be written in components as
(4.16) η ( X , Y ) = η μ ν X μ Y ν = η 00 X 0 Y 0 + η 11 X 1 Y 1 + η 22 X 2 Y 2 + η 33 X 3 Y 3 , (4.16) η ( X , Y ) = η μ ν X μ Y ν = η 00 X 0 Y 0 + η 11 X 1 Y 1 + η 22 X 2 Y 2 + η 33 X 3 Y 3 , {:[(4.16)eta(X","Y)=eta_(mu nu)X^(mu)Y^(nu)],[=eta_(00)X^(0)Y^(0)+eta_(11)X^(1)Y^(1)+eta_(22)X^(2)Y^(2)+eta_(33)X^(3)Y^(3)","]:}\begin{align*} \boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Y}) & =\eta_{\mu \nu} X^{\mu} Y^{\nu} \tag{4.16}\\ & =\eta_{00} X^{0} Y^{0}+\eta_{11} X^{1} Y^{1}+\eta_{22} X^{2} Y^{2}+\eta_{33} X^{3} Y^{3}, \end{align*}(4.16)η(X,Y)=ημνXμYν=η00X0Y0+η11X1Y1+η22X2Y2+η33X3Y3,
just as we had before. The slot machine is linear, which is to say that, if a a aaa and b b bbb are scalars, then the following rules hold:
η ( a X , b Y ) = a b η ( X , Y ) (4.17) η ( X + Y , Z ) = η ( X , Z ) + η ( Y , Z ) η ( a X , b Y ) = a b η ( X , Y ) (4.17) η ( X + Y , Z ) = η ( X , Z ) + η ( Y , Z ) {:[eta(aX","bY)=ab eta(X","Y)],[(4.17)eta(X+Y","Z)=eta(X","Z)+eta(Y","Z)]:}\begin{align*} \boldsymbol{\eta}(a \boldsymbol{X}, b \boldsymbol{Y}) & =a b \boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Y}) \\ \boldsymbol{\eta}(\boldsymbol{X}+\boldsymbol{Y}, \boldsymbol{Z}) & =\boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Z})+\boldsymbol{\eta}(\boldsymbol{Y}, \boldsymbol{Z}) \tag{4.17} \end{align*}η(aX,bY)=abη(X,Y)(4.17)η(X+Y,Z)=η(X,Z)+η(Y,Z)
We call the metric slot machine η ( η ( eta(\boldsymbol{\eta}(η(, ) a ( 0 , 2 ) ) a ( 0 , 2 ) )a(0,2)) a (0,2))a(0,2) tensor. The notation ( m , n ) ( m , n ) (m,n)(m, n)(m,n) gives the valence of a tensor: how many indices the components have in the up ( m m mmm ) and down ( n n nnn ) positions. Since the components of the metric tensor η ( η ( eta(\boldsymbol{\eta}(η(, ) a r e η μ ν ) a r e η μ ν )areeta_(mu nu)) are \eta_{\mu \nu})areημν, we have two down indices and so m = 0 , n = 2 m = 0 , n = 2 m=0,n=2m=0, n=2m=0,n=2.
Next we identify vectors as valence ( 1 , 0 ) ( 1 , 0 ) (1,0)(1,0)(1,0) tensors and 1-forms, such as σ ~ = σ μ ω μ σ ~ = σ μ ω μ tilde(sigma)=sigma_(mu)omega^(mu)\tilde{\boldsymbol{\sigma}}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}σ~=σμωμ, as ( 0 , 1 ) ( 0 , 1 ) (0,1)(0,1)(0,1) tensors. In filling the slots to make a scalar, the sum of valences of all objects involved must make m m mmm and n n nnn equal. 6 6 ^(6){ }^{6}6 So inputting two ( 1 , 0 ) ( 1 , 0 ) (1,0)(1,0)(1,0) vectors into a ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) tensor gives ( 1 , 0 ) + ( 1 , 0 ) + ( 0 , 2 ) = ( 1 , 0 ) + ( 1 , 0 ) + ( 0 , 2 ) = (1,0)+(1,0)+(0,2)=(1,0)+(1,0)+(0,2)=(1,0)+(1,0)+(0,2)= ( 2 , 2 ) ( 2 , 2 ) (2,2)(2,2)(2,2), so that m = n m = n m=nm=nm=n, and this then yields a scalar.
What does the slot machine interpretation imply for vectors and 1forms? A vector, taken as a ( 1 , 0 ) ( 1 , 0 ) (1,0)(1,0)(1,0) tensor, has a slot that can be filled with a ( 0 , 1 ) ( 0 , 1 ) (0,1)(0,1)(0,1) tensor to make a number. We rewrite the vector to show its slot as X ( ) X ( ) X()\boldsymbol{X}()X(). Insert a 1 -form into the slot of a vector X ( σ ~ ) X ( σ ~ ) X( tilde(sigma))\boldsymbol{X}(\tilde{\boldsymbol{\sigma}})X(σ~). This is equivalent to inserting a vector into the slot of a 1 -form σ ~ ( X ) σ ~ ( X ) tilde(sigma)(X)\tilde{\boldsymbol{\sigma}}(\boldsymbol{X})σ~(X). To put things on an equal footing we write this as a linear operation known as an inner product (known sometimes as a contraction) using angle brackets as follows: 7 7 ^(7){ }^{7}7
(4.18) σ ~ , X = X ( σ ~ ) = σ ~ ( X ) (4.18) σ ~ , X = X ( σ ~ ) = σ ~ ( X ) {:(4.18)(: tilde(sigma)","X:)=X( tilde(sigma))= tilde(sigma)(X):}\begin{equation*} \langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\boldsymbol{X}(\tilde{\boldsymbol{\sigma}})=\tilde{\boldsymbol{\sigma}}(\boldsymbol{X}) \tag{4.18} \end{equation*}(4.18)σ~,X=X(σ~)=σ~(X)
To calculate the inner product, we expand the components and use linearity to find
(4.19) σ ~ , X = σ ν ω ν , X μ e μ = σ ν X μ ω ν , e μ (4.19) σ ~ , X = σ ν ω ν , X μ e μ = σ ν X μ ω ν , e μ {:(4.19)(: tilde(sigma)","X:)=(:sigma_(nu)omega^(nu),X^(mu)e_(mu):)=sigma_(nu)X^(mu)(:omega^(nu),e_(mu):):}\begin{equation*} \langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\left\langle\sigma_{\nu} \boldsymbol{\omega}^{\nu}, X^{\mu} \boldsymbol{e}_{\mu}\right\rangle=\sigma_{\nu} X^{\mu}\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle \tag{4.19} \end{equation*}(4.19)σ~,X=σνων,Xμeμ=σνXμων,eμ
To compute this, we need a rule for the inner product of the basis 1 forms and basis vectors ω ν , e μ ω ν , e μ (:omega^(nu),e_(mu):)\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangleων,eμ. This is perhaps the most important rule for manipulating tensors and is given by
(4.20) ω ν , e μ = δ ν μ (4.20) ω ν , e μ = δ ν μ {:(4.20)(:omega^(nu),e_(mu):)=delta^(nu)_(mu):}\begin{equation*} \left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle=\delta^{\nu}{ }_{\mu} \tag{4.20} \end{equation*}(4.20)ων,eμ=δνμ
Using this rule, we have
(4.21) σ ~ , X = σ ν X μ ω ν , e μ = σ ν X μ δ ν μ = σ μ X μ . (4.21) σ ~ , X = σ ν X μ ω ν , e μ = σ ν X μ δ ν μ = σ μ X μ . {:(4.21)(: tilde(sigma)","X:)=sigma_(nu)X^(mu)(:omega^(nu),e_(mu):)=sigma_(nu)X^(mu)delta^(nu)_(mu)=sigma_(mu)X^(mu).:}\begin{equation*} \langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\sigma_{\nu} X^{\mu}\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle=\sigma_{\nu} X^{\mu} \delta^{\nu}{ }_{\mu}=\sigma_{\mu} X^{\mu} . \tag{4.21} \end{equation*}(4.21)σ~,X=σνXμων,eμ=σνXμδνμ=σμXμ.
As promised at the start of this chapter, we see that the components of vectors and 1 -forms are combined to make a scalar.
Example 4.3
Having basis vectors and 1 -forms available allows us a simple method to extract components. An up component of a vector can be extracted by feeding a basis 1 -form ω μ ω μ omega^(mu)\boldsymbol{\omega}^{\mu}ωμ into the vector's slot
(4.22) X ( ω μ ) = ω μ , X = X ν ω μ , e ν = X ν δ μ ν = X μ (4.22) X ω μ = ω μ , X = X ν ω μ , e ν = X ν δ μ ν = X μ {:(4.22)X(omega^(mu))=(:omega^(mu),X:)=X^(nu)(:omega^(mu),e_(nu):)=X^(nu)delta^(mu)_(nu)=X^(mu):}\begin{equation*} \boldsymbol{X}\left(\boldsymbol{\omega}^{\mu}\right)=\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{X}\right\rangle=X^{\nu}\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\nu}\right\rangle=X^{\nu}{\delta^{\mu}}_{\nu}=X^{\mu} \tag{4.22} \end{equation*}(4.22)X(ωμ)=ωμ,X=Xνωμ,eν=Xνδμν=Xμ
Similarly for the 1 -form, we extract its components by inserting a basis vector into its slot
(4.23) σ ~ ( e μ ) = σ ~ , e μ = σ ν ω ν , e μ = σ ν δ ν μ = σ μ (4.23) σ ~ e μ = σ ~ , e μ = σ ν ω ν , e μ = σ ν δ ν μ = σ μ {:(4.23) tilde(sigma)(e_(mu))=(:( tilde(sigma)),e_(mu):)=sigma_(nu)(:omega^(nu),e_(mu):)=sigma_(nu)delta^(nu)_(mu)=sigma_(mu):}\begin{equation*} \tilde{\boldsymbol{\sigma}}\left(\boldsymbol{e}_{\mu}\right)=\left\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{e}_{\mu}\right\rangle=\sigma_{\nu}\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle=\sigma_{\nu} \delta^{\nu}{ }_{\mu}=\sigma_{\mu} \tag{4.23} \end{equation*}(4.23)σ~(eμ)=σ~,eμ=σνων,eμ=σνδνμ=σμ
We introduced the 1 -form geometrically as a set of planes and the vector as an arrow. The inner product also has a geometrical interpretation: we think of the vector arrow piercing the 1-form planes, as described in the next example.

Example 4.4

In 1924, Louis Victor Pierre Raymond, 7th duc de Broglie, proposed that all particles have wave-like properties. A particle's momentum p p p\boldsymbol{p}p is related to its wavevector k k k\boldsymbol{k}k via p = k p = k p=ℏk\boldsymbol{p}=\hbar \boldsymbol{k}p=k. Here, the magnitude of the wavevector is related to the particle's wavelength λ λ lambda\lambdaλ via | k | = 2 π / λ | k | = 2 π / λ |k|=2pi//lambda|\boldsymbol{k}|=2 \pi / \lambda|k|=2π/λ. The amplitude ψ ψ psi\psiψ of a wave is written as a complex exponential with a phase ϕ ϕ phi\phiϕ
(4.24) ψ ( x ) = A e i ϕ = A e i k x = A e i p x / (4.24) ψ ( x ) = A e i ϕ = A e i k x = A e i p x / {:(4.24)psi(x)=Ae^(iphi)=Ae^(ik*x)=Ae^(ip*x//ℏ):}\begin{equation*} \psi(x)=A \mathrm{e}^{\mathrm{i} \phi}=A \mathrm{e}^{\mathrm{i} \boldsymbol{k} \cdot \boldsymbol{x}}=A \mathrm{e}^{\mathrm{i} \boldsymbol{p} \cdot \boldsymbol{x} / \hbar} \tag{4.24} \end{equation*}(4.24)ψ(x)=Aeiϕ=Aeikx=Aeipx/
We can describe the quantum wave/particle by its momentum vector. If we want to know the phase difference Δ ϕ Δ ϕ Delta phi\Delta \phiΔϕ between the wave at two positions x 1 x 1 x_(1)\boldsymbol{x}_{1}x1 and x 2 x 2 x_(2)\boldsymbol{x}_{2}x2, separated by a vector x = x 2 x 1 x = x 2 x 1 x=x_(2)-x_(1)\boldsymbol{x}=\boldsymbol{x}_{2}-\boldsymbol{x}_{1}x=x2x1 we can evaluate Δ ϕ = k x Δ ϕ = k x Delta phi=k*x\Delta \phi=\boldsymbol{k} \cdot \boldsymbol{x}Δϕ=kx, that is, the dot product of k k k\boldsymbol{k}k and the vector linking the two points x x x\boldsymbol{x}x. This works elegantly in Minkowski space. The 4 -vector k k k\boldsymbol{k}k is related to p p p\boldsymbol{p}p by p = k p = k p=ℏk\boldsymbol{p}=\hbar \boldsymbol{k}p=k, where p = ( E , p ) p = ( E , p ) p=(E, vec(p))\boldsymbol{p}=(E, \vec{p})p=(E,p) and k = ( ω , k ) k = ( ω , k ) k=(omega,k)\boldsymbol{k}=(\omega, k)k=(ω,k), and now the phase Δ ϕ = k x = k x ω t Δ ϕ = k x = k x ω t Delta phi=k*x= vec(k)* vec(x)-omega t\Delta \phi=\boldsymbol{k} \cdot \boldsymbol{x}=\vec{k} \cdot \vec{x}-\omega tΔϕ=kx=kxωt.
8 8 ^(8){ }^{8}8 Recap: We previously defined a momentum vector p p ppp with components
p μ = ( E , p x , p y , p z ) p μ = E , p x , p y , p z p^(mu)=(E,p^(x),p^(y),p^(z))p^{\mu}=\left(E, p^{x}, p^{y}, p^{z}\right)pμ=(E,px,py,pz) and a velocity vector u u u\boldsymbol{u}u with components u μ = u μ = u^(mu)=u^{\mu}=uμ= γ ( 1 , v 1 , v 2 , v 3 ) γ 1 , v 1 , v 2 , v 3 gamma(1,v^(1),v^(2),v^(3))\gamma\left(1, v^{1}, v^{2}, v^{3}\right)γ(1,v1,v2,v3). The Minkowski tensor can be used to produce down versions p μ = ( E , p x , p y , p z ) p μ = E , p x , p y , p z p_(mu)=(-E,p^(x),p^(y),p^(z))p_{\mu}=\left(-E, p^{x}, p^{y}, p^{z}\right)pμ=(E,px,py,pz) and u μ = u μ = u_(mu)=u_{\mu}=uμ= γ ( 1 , v 1 , v 2 , v 3 ) γ 1 , v 1 , v 2 , v 3 gamma(-1,v^(1),v^(2),v^(3))\gamma\left(-1, v^{1}, v^{2}, v^{3}\right)γ(1,v1,v2,v3).
9 9 ^(9){ }^{9}9 See eqn 2.38 [ X 0 obs = X u ] 2.38 X 0 obs  = X u 2.38[X_(0)^("obs ")=-X*u]2.38\left[X_{0}^{\text {obs }}=-\boldsymbol{X} \cdot \boldsymbol{u}\right]2.38[X0obs =Xu] or using this chapter's ideas, X 0 obs = X 0 obs  = X_(0)^("obs ")=X_{0}^{\text {obs }}=X0obs = X ~ , u X ~ ( u ) X ~ , u X ~ ( u ) -(: tilde(X),u:)-=- tilde(X)(u)-\langle\tilde{\boldsymbol{X}}, \boldsymbol{u}\rangle \equiv-\tilde{\boldsymbol{X}}(\boldsymbol{u})X~,uX~(u).
10 10 ^(10){ }^{10}10 Recall that in the particle's rest frame, we have components u μ = u μ = u^(mu)=u^{\mu}=uμ= (1, 0, 0, 0), so we can write u = u μ e μ = u = u μ e μ = u=u^(mu)e_(mu)=\boldsymbol{u}=u^{\mu} \boldsymbol{e}_{\mu}=u=uμeμ= e 0 e 0 e_(0)e_{0}e0.
11 11 ^(11){ }^{11}11 Once again, we can use eqn 2.38 but this time with J J JJJ as a 4 -vector.
Note that in these examples, we could turn things around. For example, we could treat u ~ u ~ tilde(u)\tilde{\boldsymbol{u}}u~ as a 1 -form and have J J J\boldsymbol{J}J as a vector and then write the final answe as u ( J ) = n u ¯ ( J ) = n bar(u)(J)=-n\overline{\boldsymbol{u}}(\boldsymbol{J})=-nu(J)=n. In component form eqns 4.27 and 4.28 can be written as p μ u μ = E p μ u μ = E p_(mu)u^(mu)=-Ep_{\mu} u^{\mu}=-Epμuμ=E and J μ u μ = n J μ u μ = n J_(mu)u^(mu)=-nJ_{\mu} u^{\mu}=-nJμuμ=n.
12 12 ^(12){ }^{12}12 This should be unsurprising since we have already done this using components in eqn 4.2, using the metric tensor to convert an object with up-indices into one with down-indices. Here though, we are doing it entirely geometrically, without worrying about the components.
There is another way. Instead of a momentum vector, we can imagine equally spaced, parallel surfaces separated by a distance proportional to the wavelength of the wave. We'll call this set of surfaces the wave's 1 -form k ~ k ~ tilde(k)\tilde{\boldsymbol{k}}k~. They are in fact the surfaces of constant phase in the wave. Now if we want to know the phase difference between two points we simply evaluate the inner product k ~ , x k ~ , x (: tilde(k),x:)\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\ranglek~,x which we can think of as a machine that counts the number of the surfaces of k ~ k ~ tilde(k)\tilde{\boldsymbol{k}}k~ that the vector x x x\boldsymbol{x}x pierces (see Fig. 4.3). We have
(4.25) Δ ϕ = k ~ , x = (number of surfaces pierced). (4.25) Δ ϕ = k ~ , x =  (number of surfaces pierced).  {:(4.25)Delta phi=(: tilde(k)","x:)=" (number of surfaces pierced). ":}\begin{equation*} \Delta \phi=\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\rangle=\text { (number of surfaces pierced). } \tag{4.25} \end{equation*}(4.25)Δϕ=k~,x= (number of surfaces pierced). 
See from Fig. 4.3 how a vector appears to pierce some number of the 1-form's planes: this number is equal to the inner product. We input a 1-form into the inner product's first slot, and a vector into the second. The inner product slot machine outputs a number telling us how many 1-form planes are pierced by the vector or
(4.26) σ ~ , X = ( Number of planes of the 1-form σ ~ pierced by the vector X ) (4.26) σ ~ , X = (  Number of planes of the 1-form  σ ~  pierced by the vector  X ) {:(4.26)(: tilde(sigma)","X:)=((" Number of planes of the 1-form "( tilde(sigma)))/(" pierced by the vector "X)):}\begin{equation*} \langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\binom{\text { Number of planes of the 1-form } \tilde{\boldsymbol{\sigma}}}{\text { pierced by the vector } \boldsymbol{X}} \tag{4.26} \end{equation*}(4.26)σ~,X=( Number of planes of the 1-form σ~ pierced by the vector X)
With 1-forms as part of our machinery, a natural question is what physical quantities they represent. We examine this in the next example.

Example 4.5

The momentum of a particle 8 8 ^(8){ }^{8}8 can be represented by a 1-form, whose components are given from the Lagrangian by p μ = L / x ˙ μ p μ = L / x ˙ μ p_(mu)=del L//delx^(˙)^(mu)p_{\mu}=\partial L / \partial \dot{x}^{\mu}pμ=L/x˙μ. So a particle has momentum 1-form p ~ ( ) p ~ ( ) tilde(p)()\tilde{\boldsymbol{p}}()p~(). Insert the velocity 4 -vector u u u\boldsymbol{u}u into its slot and we output a number. This quantity is 9 9 ^(9){ }^{9}9 minus the energy E E -E-EE of the particle, as measured by an observer O u O u O_(u)O_{\boldsymbol{u}}Ou with velocity vector u u u\boldsymbol{u}u tangent to their world line. That is
(4.27) p ~ ( u ) = E = ( Energy of particle measured by O u ) . (4.27) p ~ ( u ) = E = (  Energy of particle   measured by  O u ) . {:(4.27) tilde(p)(u)=-E=-((" Energy of particle ")/(" measured by "O_(u))).:}\begin{equation*} \tilde{\boldsymbol{p}}(\boldsymbol{u})=-E=-\binom{\text { Energy of particle }}{\text { measured by } O_{u}} . \tag{4.27} \end{equation*}(4.27)p~(u)=E=( Energy of particle  measured by Ou).
We can test this equation in the particle's rest frame, 10 10 ^(10){ }^{10}10 in which O u O u O_(u)O_{u}Ou has u = e 0 u = e 0 u=e_(0)\boldsymbol{u}=\boldsymbol{e}_{0}u=e0, which means E = p μ ω μ , e 0 = p 0 E = p μ ω μ , e 0 = p 0 E=-p_(mu)(:omega^(mu),e_(0):)=-p_(0)E=-p_{\mu}\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{0}\right\rangle=-p_{0}E=pμωμ,e0=p0. Using the Minkowski metric, we have p 0 = p 0 = -p_(0)=-p_{0}=p0= p 0 p 0 p^(0)p^{0}p0, which is indeed the particle's energy.
An observer with velocity u u u\boldsymbol{u}u passes through a cloud of dust carrying a small, permeable box of a known spatial volume (known as a 3 -volume). The observer makes measurements by counting the number of particles in the box. We define the particle current 1 -form J ( ) J ¯ ( ) bar(J)()\overline{\boldsymbol{J}}()J(). Insert 11 11 ^(11){ }^{11}11 the velocity vector of the observer u u u\boldsymbol{u}u and output (minus) the number density of particles n n -n-nn measured by the observer with velocity u u u\boldsymbol{u}u. That is
(4.28) J ~ ( u ) = n = ( number density of particles measured by O u ) . (4.28) J ~ ( u ) = n = (  number density of particles   measured by  O u ) . {:(4.28) tilde(J)(u)=-n=-((" number density of particles ")/(" measured by "O_(u))).:}\begin{equation*} \tilde{\boldsymbol{J}}(\boldsymbol{u})=-n=-\binom{\text { number density of particles }}{\text { measured by } O_{u}} . \tag{4.28} \end{equation*}(4.28)J~(u)=n=( number density of particles  measured by Ou).
At the start of the chapter, we described vectors as living in a tangent space. 1-forms live in a different space, known as a dual space. Equation 4.20 ( ω ν , e μ = δ ν μ ) 4.20 ω ν , e μ = δ ν μ 4.20((:omega^(nu),e_(mu):)=delta^(nu)_(mu))4.20\left(\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle=\delta^{\nu}{ }_{\mu}\right)4.20(ων,eμ=δνμ) gives the relationship between these spaces. An interesting question is whether it is possible to map objects between these two spaces, such that one could take a vector and then find an equivalent 1 -form. Such a mapping is carried out using the metric tensor. 12 12 ^(12){ }^{12}12 Notice that the inner product σ ~ , X = σ μ X ν σ ~ , X = σ μ X ν (: tilde(sigma),X:)=sigma_(mu)X^(nu)\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\sigma_{\mu} X^{\nu}σ~,X=σμXν is just as if we
had taken the dot product of two vectors X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}X=Xμeμ and σ = σ ν e ν σ = σ ν e ν sigma=sigma^(nu)e_(nu)\boldsymbol{\sigma}=\sigma^{\nu} \boldsymbol{e}_{\nu}σ=σνeν (i.e. the components of the 1-form with the index raised). In fact, we have
(4.29) σ X = η ( σ , X ) σ ~ , X (4.29) σ X = η ( σ , X ) σ ~ , X {:(4.29)sigma*X=eta(sigma","X)-=(: tilde(sigma)","X:):}\begin{equation*} \boldsymbol{\sigma} \cdot \boldsymbol{X}=\boldsymbol{\eta}(\boldsymbol{\sigma}, \boldsymbol{X}) \equiv\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle \tag{4.29} \end{equation*}(4.29)σX=η(σ,X)σ~,X
This allows us to read off what happens if we fill in just one slot in the metric η ( σ η ( σ eta(sigma\boldsymbol{\eta}(\boldsymbol{\sigma}η(σ, ) . S i n c e , u p o n d o i n g t h i s , w e s t i l l h a v e o n e s l o t l e f t t o i n p u t ) . S i n c e , u p o n d o i n g t h i s , w e s t i l l h a v e o n e s l o t l e f t t o i n p u t ).Since,upondoingthis,westillhaveoneslotlefttoinput) . Since, upon doing this, we still have one slot left to input).Since,upondoingthis,westillhaveoneslotlefttoinput a vector, the output must be a ( 0 , 1 ) ( 0 , 1 ) (0,1)(0,1)(0,1) tensor, also known as a 1 -form. We conclude that
(4.30) η ( σ , ) = σ ~ . (4.30) η ( σ , ) = σ ~ . {:(4.30)eta(sigma",")= tilde(sigma).:}\begin{equation*} \boldsymbol{\eta}(\boldsymbol{\sigma},)=\tilde{\sigma} . \tag{4.30} \end{equation*}(4.30)η(σ,)=σ~.
That is, the metric slot machine maps vectors onto 1-forms.

4.3 Transformations

We have stressed the role of transformations between sets of coordinates. The inner product X α Y α X α Y α X_(alpha^('))Y^(alpha^('))X_{\alpha^{\prime}} Y^{\alpha^{\prime}}XαYα is a scalar and should, therefore, be coordinate invariant. Recall that the up components of a vector transform as
(4.31) X α = Λ α μ X μ = x α x μ X μ , (4.31) X α = Λ α μ X μ = x α x μ X μ , {:(4.31)X^(alpha^('))=Lambda^(alpha^('))_(mu)X^(mu)=(delx^(alpha^(')))/(delx^(mu))X^(mu)",":}\begin{equation*} X^{\alpha^{\prime}}=\Lambda^{\alpha^{\prime}}{ }_{\mu} X^{\mu}=\frac{\partial x^{\alpha^{\prime}}}{\partial x^{\mu}} X^{\mu}, \tag{4.31} \end{equation*}(4.31)Xα=ΛαμXμ=xαxμXμ,
where x α x α x^(alpha^('))x^{\alpha^{\prime}}xα and x μ x μ x^(mu)x^{\mu}xμ are two sets of coordinates. 13 13 ^(13){ }^{13}13 Since we have that Λ ν α Λ α μ = δ ν μ Λ ν α Λ α μ = δ ν μ Lambda^(nu)_(alpha)Lambda^(alpha^('))_(mu)=delta^(nu)_(mu)\Lambda^{\nu}{ }_{\alpha} \Lambda^{\alpha^{\prime}}{ }_{\mu}=\delta^{\nu}{ }_{\mu}ΛναΛαμ=δνμ it must be the case that the down components should transform as
(4.32) X β = Λ β ν X ν = x ν x β X ν (4.32) X β = Λ β ν X ν = x ν x β X ν {:(4.32)X_(beta^('))=Lambda_(beta^('))^(nu)X_(nu)=(delx^(nu))/(delx^(beta^(')))X_(nu):}\begin{equation*} X_{\beta^{\prime}}=\Lambda_{\beta^{\prime}}^{\nu} X_{\nu}=\frac{\partial x^{\nu}}{\partial x^{\beta^{\prime}}} X_{\nu} \tag{4.32} \end{equation*}(4.32)Xβ=ΛβνXν=xνxβXν
with the result that
(4.33) X α Y α = X μ Λ α μ Λ ν α Y ν = X μ δ μ ν Y ν = X μ Y μ (4.33) X α Y α = X μ Λ α μ Λ ν α Y ν = X μ δ μ ν Y ν = X μ Y μ {:(4.33)X_(alpha^('))Y^(alpha^('))=X_(mu)Lambda_(alpha^('))^(mu)Lambda_(nu)^(alpha^('))Y^(nu)=X_(mu)delta^(mu)_(nu)Y^(nu)=X_(mu)Y^(mu):}\begin{equation*} X_{\alpha^{\prime}} Y^{\alpha^{\prime}}=X_{\mu} \Lambda_{\alpha^{\prime}}^{\mu} \Lambda_{\nu}^{\alpha^{\prime}} Y^{\nu}=X_{\mu} \delta^{\mu}{ }_{\nu} Y^{\nu}=X_{\mu} Y^{\mu} \tag{4.33} \end{equation*}(4.33)XαYα=XμΛαμΛναYν=XμδμνYν=XμYμ
Compared with the results from the previous chapter, we see that the down components transform in the same way as the basis vectors e μ . 14 e μ . 14 e_(mu).^(14)\boldsymbol{e}_{\mu} .{ }^{14}eμ.14 It should then come as no surprise that the basis 1 -forms transform in the same way as the vector components, as demonstrated in the next example.
Example 4.6
We can see how to transform basis 1-forms by considering the contraction ω β , e α = ω β , e α = (:omega^(beta),e_(alpha):)=\left\langle\boldsymbol{\omega}^{\beta}, \boldsymbol{e}_{\alpha}\right\rangle=ωβ,eα= δ β α δ β α delta^(beta)_(alpha)\delta^{\beta}{ }_{\alpha}δβα. First, multiply through by a coordinate transformation Λ α γ Λ α γ Lambda^(alpha)_(gamma^('))\Lambda^{\alpha}{ }_{\gamma^{\prime}}Λαγ to find
(4.34) ω β , Λ γ α e α = Λ γ α δ α β = Λ γ β (4.34) ω β , Λ γ α e α = Λ γ α δ α β = Λ γ β {:(4.34)(:omega^(beta),Lambda_(gamma^('))^(alpha)e_(alpha):)=Lambda_(gamma^('))^(alpha)delta_(alpha)^(beta)=Lambda_(gamma^('))^(beta):}\begin{equation*} \left\langle\boldsymbol{\omega}^{\beta}, \Lambda_{\gamma^{\prime}}^{\alpha} \boldsymbol{e}_{\alpha}\right\rangle=\Lambda_{\gamma^{\prime}}^{\alpha} \delta_{\alpha}^{\beta}=\Lambda_{\gamma^{\prime}}^{\beta} \tag{4.34} \end{equation*}(4.34)ωβ,Λγαeα=Λγαδαβ=Λγβ
Now multiply through by Λ σ β Λ σ β Lambda^(sigma^('))_(beta)\Lambda^{\sigma^{\prime}}{ }_{\beta}Λσβ and write
(4.35) Λ β σ ω β , Λ γ α e α = Λ β σ Λ γ β = δ γ σ (4.35) Λ β σ ω β , Λ γ α e α = Λ β σ Λ γ β = δ γ σ {:(4.35)(:Lambda_(beta)^(sigma^('))omega^(beta),Lambda_(gamma^('))^(alpha)e_(alpha):)=Lambda_(beta)^(sigma^('))Lambda_(gamma^('))^(beta)=delta_(gamma^('))^(sigma^(')):}\begin{equation*} \left\langle\Lambda_{\beta}^{\sigma^{\prime}} \boldsymbol{\omega}^{\beta}, \Lambda_{\gamma^{\prime}}^{\alpha} \boldsymbol{e}_{\alpha}\right\rangle=\Lambda_{\beta}^{\sigma^{\prime}} \Lambda_{\gamma^{\prime}}^{\beta}=\delta_{\gamma^{\prime}}^{\sigma^{\prime}} \tag{4.35} \end{equation*}(4.35)Λβσωβ,Λγαeα=ΛβσΛγβ=δγσ
Comparing against ω σ , e γ = δ σ γ ω σ , e γ = δ σ γ (:omega^(sigma^(')),e_(gamma^(')):)=delta^(sigma^('))_(gamma^('))\left\langle\boldsymbol{\omega}^{\sigma^{\prime}}, \boldsymbol{e}_{\gamma^{\prime}}\right\rangle=\delta^{\sigma^{\prime}}{ }_{\gamma^{\prime}}ωσ,eγ=δσγ we conclude
(4.36) ω σ = Λ β σ ω β (4.36) ω σ = Λ β σ ω β {:(4.36)omega^(sigma^('))=Lambda_(beta)^(sigma^('))omega^(beta):}\begin{equation*} \boldsymbol{\omega}^{\sigma^{\prime}}=\Lambda_{\beta}^{\sigma^{\prime}} \boldsymbol{\omega}^{\beta} \tag{4.36} \end{equation*}(4.36)ωσ=Λβσωβ
Once more, with feeling: the basis 1 -forms transform in the same way as the vector components.
Written out in components, eqn 4.30 is
η μ ν X μ = X ν η μ ν X μ = X ν eta_(mu nu)X^(mu)=X_(nu)\eta_{\mu \nu} X^{\mu}=X_{\nu}ημνXμ=Xν
where X ν X ν X_(nu)X_{\nu}Xν are the components of the 1 -form X ~ = X α ω α X ~ = X α ω α tilde(X)=X_(alpha)omega^(alpha)\tilde{\boldsymbol{X}}=X_{\alpha} \boldsymbol{\omega}^{\alpha}X~=Xαωα. This means that in Example 4.4, the 1 -form k k k\boldsymbol{k}k has components k μ = η μ ν k ν = ( ω , k ) k μ = η μ ν k ν = ( ω , k ) k_(mu)=eta_(mu nu)k^(nu)=(-omega,k)k_{\mu}=\eta_{\mu \nu} k^{\nu}=(-\omega, k)kμ=ημνkν=(ω,k) and so k ~ , x k ~ , x (: tilde(k),x:)\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\ranglek~,x yields the required phase.
13 13 ^(13){ }^{13}13 As usual we denote one coordinate system with primed indices and one without primes.
14 14 ^(14){ }^{14}14 The reason is the same: we want both (i) the vector X μ e μ X μ e μ X^(mu)e_(mu)X^{\mu} e_{\mu}Xμeμ and (ii) the scalar X μ Y μ X μ Y μ X_(mu)Y^(mu)X_{\mu} Y^{\mu}XμYμ to be independent of coordinates. This also explains our notation, with 1-form components and basis vectors both carrying an index in the down position: this tells us that they transform the same way.
15 15 ^(15){ }^{15}15 One thing that the bold-symbol notation for a tensor T ( , T ( , T(,\boldsymbol{T}(,T(,, ) l a c k s i s c l e a r ) l a c k s i s c l e a r )lacksisclear) lacks is clear)lacksisclear tation for a tensor T ( , T ( , T(,\boldsymbol{T}(,T(,, ) l a c k s i s c l e a r ) l a c k s i s c l e a r )lacksisclear) lacks is clear)lacksisclear
guidance on the valence of the tensor, guidance on the valence of the tensor,
which must be given separately in the which must be given separately in the
form ( m , n ) ( m , n ) (m,n)(m, n)(m,n). One solution to this is form ( m , n ) ( m , n ) (m,n)(m, n)(m,n). One solution to this is
to use abstract index notation, invented by Roger Penrose (1931-). The idea here is to specify the slots using indices, so a ( 2 , 1 ) ( 2 , 1 ) (2,1)(2,1)(2,1) tensor would be written as T a b c T a b c T^(ab)_(c)T^{a b}{ }_{c}Tabc. The indices here are not the components; to express those we need to ensure that we specify components and slots with different letters. One common convention is to use Roman letters for slots and Greek letters for components, so that the components of T T T\boldsymbol{T}T would be T μ ν ρ T μ ν ρ T^(mu nu)_(rho)T^{\mu \nu}{ }_{\rho}Tμνρ. Clearly there is potential for confusion here, so it's necessary to know the convention being adopted.
To extract a number from a tensor, we insert 1-forms Z ~ a Z ~ a tilde(Z)_(a)\tilde{Z}_{a}Z~a and Z ~ b Z ~ b tilde(Z)_(b)\tilde{Z}_{b}Z~b and a vector A c A c A^(c)A^{c}Ac, balancing Roman indices to obtain
T a b Z ~ a Z ~ b A c , T a b Z ~ a Z ~ b A c , T^(ab) tilde(Z)_(a) tilde(Z)_(b)A^(c),T^{a b} \tilde{Z}_{a} \tilde{Z}_{b} A^{c},TabZ~aZ~bAc,
(4.38)
where we note that the letters denote the relevant slot, rather than an in struction to sum on an index. We don't use abstract index notation here although some of the more advanced textbooks in the subject (e.g. Wald) do use it.
16 A 16 A ^(16)A{ }^{16} \mathrm{~A}16 A mixed object is a tensor with a valence ( m , n m , n m,nm, nm,n ) where m , n 0 m , n 0 m,n!=0m, n \neq 0m,n0, that is, a tensor whose components carry both up and down indices.
Fig. 4.6 The tensor as a slot machine. It has m m mmm slots for 1 -forms and n n nnn slots for vectors. If you insert those and turn the handle (metaphorically) then it spits out a number.
We can now summarize how to transform components and basis vectors:
X β = Λ β α X α , σ β = Λ α β σ α (4.37) e β = Λ α β e α , ω β = Λ α β α ω α X β = Λ β α X α , σ β = Λ α β σ α (4.37) e β = Λ α β e α , ω β = Λ α β α ω α {:[X^(beta^('))=Lambda^(beta^('))_(alpha)X^(alpha)","quadsigma_(beta^('))=Lambda^(alpha)_(beta^('))sigma_(alpha)],[(4.37)e_(beta^('))=Lambda^(alpha)_(beta^('))e_(alpha)","],[omega^(beta^('))=Lambda_(alpha)^(beta^('))_(alpha)omega^(alpha)]:}\begin{align*} & X^{\beta^{\prime}}=\Lambda^{\beta^{\prime}}{ }_{\alpha} X^{\alpha}, \quad \sigma_{\beta^{\prime}}=\Lambda^{\alpha}{ }_{\beta^{\prime}} \sigma_{\alpha} \\ & \boldsymbol{e}_{\beta^{\prime}}=\Lambda^{\alpha}{ }_{\beta^{\prime}} \boldsymbol{e}_{\alpha}, \tag{4.37}\\ & \boldsymbol{\omega}^{\beta^{\prime}}=\Lambda_{\alpha}^{\beta^{\prime}}{ }_{\alpha} \boldsymbol{\omega}^{\alpha} \end{align*}Xβ=ΛβαXα,σβ=Λαβσα(4.37)eβ=Λαβeα,ωβ=Λαβαωα

4.4 Tensors

Let's now look at the general concept of a tensor. Generally speaking, the tensor T T T\boldsymbol{T}T is a linear slot machine with m m mmm slots for inputting 1-forms and n n nnn slots to input vectors (see Fig. 4.6). We have to specify how many of each by specifying the valence ( m , n ) ( m , n ) (m,n)(m, n)(m,n) of the tensor. 15 15 ^(15){ }^{15}15
For vectors we can write an expression relating the vector to its components and basis vectors X = X μ e μ X = X μ e μ X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} e_{\mu}X=Xμeμ, and an analogous expression for 1 -forms σ = σ μ ω μ σ = σ μ ω μ sigma=sigma_(mu)omega^(mu)\boldsymbol{\sigma}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}σ=σμωμ. To write a similar expression for tensors we need to use the outer product between basis vectors, denoted by ox\otimes. This symbol is simply a means of denoting the slot machine character of the tensor. Its key property is that it maintains the ordering of the slots. This idea is best understood by considering an example.

Example 4.7

Consider a tensor e 1 e 2 e 1 e 2 e_(1)oxe_(2)e_{1} \otimes e_{2}e1e2 : in words, the outer product of the basis vector in the 1 direction and the basis vector in the 2 direction. This is an object with two slots that takes two 1 -forms. The outer product symbol ox\otimes simply tells us that the first slot refers to e 1 e 1 e_(1)\boldsymbol{e}_{1}e1 and the second to e 2 e 2 e_(2)\boldsymbol{e}_{2}e2. Inserting 1-forms α ~ = α μ ω μ α ~ = α μ ω μ tilde(alpha)=alpha_(mu)omega^(mu)\tilde{\boldsymbol{\alpha}}=\alpha_{\mu} \boldsymbol{\omega}^{\mu}α~=αμωμ and β ~ = β ν ω ν β ~ = β ν ω ν tilde(beta)=beta_(nu)omega^(nu)\tilde{\boldsymbol{\beta}}=\beta_{\nu} \boldsymbol{\omega}^{\nu}β~=βνων, we have
e 1 e 2 ( α ~ , β ~ ) = e 1 ( α ~ ) e 2 ( β ~ ) = α λ ω λ , e 1 β ρ ω ρ , e 2 = α λ δ λ 1 β ρ δ ρ 2 (4.39) = α 1 β 2 . e 1 e 2 ( α ~ , β ~ ) = e 1 ( α ~ ) e 2 ( β ~ ) = α λ ω λ , e 1 β ρ ω ρ , e 2 = α λ δ λ 1 β ρ δ ρ 2 (4.39) = α 1 β 2 . {:[e_(1)oxe_(2)( tilde(alpha)"," tilde(beta))=e_(1)( tilde(alpha))e_(2)( tilde(beta))],[=alpha_(lambda)(:omega^(lambda),e_(1):)beta_(rho)(:omega^(rho),e_(2):)],[=alpha_(lambda)delta^(lambda)_(1)beta_(rho)delta^(rho)_(2)],[(4.39)=alpha_(1)beta_(2).]:}\begin{align*} \boldsymbol{e}_{1} \otimes \boldsymbol{e}_{2}(\tilde{\boldsymbol{\alpha}}, \tilde{\boldsymbol{\beta}}) & =\boldsymbol{e}_{1}(\tilde{\boldsymbol{\alpha}}) \boldsymbol{e}_{2}(\tilde{\boldsymbol{\beta}}) \\ & =\alpha_{\lambda}\left\langle\boldsymbol{\omega}^{\lambda}, \boldsymbol{e}_{1}\right\rangle \beta_{\rho}\left\langle\boldsymbol{\omega}^{\rho}, \boldsymbol{e}_{2}\right\rangle \\ & =\alpha_{\lambda} \delta^{\lambda}{ }_{1} \beta_{\rho} \delta^{\rho}{ }_{2} \\ & =\alpha_{1} \beta_{2} . \tag{4.39} \end{align*}e1e2(α~,β~)=e1(α~)e2(β~)=αλωλ,e1βρωρ,e2=αλδλ1βρδρ2(4.39)=α1β2.
For a mixed object 16 16 ^(16){ }^{16}16 like ω 2 e 3 ω 2 e 3 omega^(2)oxe_(3)\boldsymbol{\omega}^{2} \otimes \boldsymbol{e}_{3}ω2e3, that is, a tensor formed from the outer product of the basis 1 -form for the 2 direction and the basis vector for the 3 direction, let's enter a vector v = v μ e μ v = v μ e μ v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}v=vμeμ in the first slot and a 1-form α ~ α ~ tilde(alpha)\tilde{\boldsymbol{\alpha}}α~ in the second to find
ω 2 e 3 ( v , α ~ ) = ω 2 ( v ) e 3 ( α ~ ) = v μ ω 2 , e μ α λ ω λ , e 3 (4.40) = v 2 α 3 ω 2 e 3 ( v , α ~ ) = ω 2 ( v ) e 3 ( α ~ ) = v μ ω 2 , e μ α λ ω λ , e 3 (4.40) = v 2 α 3 {:[omega^(2)oxe_(3)(v"," tilde(alpha))=omega^(2)(v)e_(3)( tilde(alpha))],[=v^(mu)(:omega^(2),e_(mu):)alpha_(lambda)(:omega^(lambda),e_(3):)],[(4.40)=v^(2)alpha_(3)]:}\begin{align*} \boldsymbol{\omega}^{2} \otimes \boldsymbol{e}_{3}(\boldsymbol{v}, \tilde{\boldsymbol{\alpha}}) & =\boldsymbol{\omega}^{2}(\boldsymbol{v}) \boldsymbol{e}_{3}(\tilde{\boldsymbol{\alpha}}) \\ & =v^{\mu}\left\langle\boldsymbol{\omega}^{2}, \boldsymbol{e}_{\mu}\right\rangle \alpha_{\lambda}\left\langle\boldsymbol{\omega}^{\lambda}, \boldsymbol{e}_{3}\right\rangle \\ & =v^{2} \alpha_{3} \tag{4.40} \end{align*}ω2e3(v,α~)=ω2(v)e3(α~)=vμω2,eμαλωλ,e3(4.40)=v2α3

Example 4.8

We could write a ( 3 , 1 ) ( 3 , 1 ) (3,1)(3,1)(3,1) tensor
(4.41) S ( , , , ) (4.41) S ( , , , ) {:(4.41)S(","","","):}\begin{equation*} \boldsymbol{S}(,,,) \tag{4.41} \end{equation*}(4.41)S(,,,)
To find the components of the ( 3 , 1 ) ( 3 , 1 ) (3,1)(3,1)(3,1) tensor, we simply fill its slots with three basis 1 -forms and one basis vector
(4.42) S σ μ ν λ = S ( ω μ , ω ν , ω λ , e σ ) (4.42) S σ μ ν λ = S ω μ , ω ν , ω λ , e σ {:(4.42)S_(sigma)^(mu nu lambda)=S(omega^(mu),omega^(nu),omega^(lambda),e_(sigma)):}\begin{equation*} S_{\sigma}^{\mu \nu \lambda}=\boldsymbol{S}\left(\boldsymbol{\omega}^{\mu}, \boldsymbol{\omega}^{\nu}, \boldsymbol{\omega}^{\lambda}, \boldsymbol{e}_{\sigma}\right) \tag{4.42} \end{equation*}(4.42)Sσμνλ=S(ωμ,ων,ωλ,eσ)
An expression for S S S\boldsymbol{S}S in terms of its components can be written as
(4.43) S ( , , , ) = S μ ν λ σ e μ ( ) e ν ( ) e λ ( ) ω σ ( ) (4.43) S ( , , , ) = S μ ν λ σ e μ ( ) e ν ( ) e λ ( ) ω σ ( ) {:(4.43)S(","","",")=S^(mu nu lambda)_(sigma)e_(mu)()oxe_(nu)()oxe_(lambda)()oxomega^(sigma)():}\begin{equation*} \boldsymbol{S}(,,,)=S^{\mu \nu \lambda}{ }_{\sigma} \boldsymbol{e}_{\mu}() \otimes \boldsymbol{e}_{\nu}() \otimes \boldsymbol{e}_{\lambda}() \otimes \boldsymbol{\omega}^{\sigma}() \tag{4.43} \end{equation*}(4.43)S(,,,)=Sμνλσeμ()eν()eλ()ωσ()
This is an object into which we can insert three 1 -forms and a vector. Inserting 1 -forms ζ ~ , η ~ ζ ~ , η ~ tilde(zeta), tilde(eta)\tilde{\boldsymbol{\zeta}}, \tilde{\boldsymbol{\eta}}ζ~,η~ and χ ~ χ ~ tilde(chi)\tilde{\boldsymbol{\chi}}χ~ and the vector u u u\boldsymbol{u}u, into our example tensor S S S\boldsymbol{S}S, we find
S ( ζ ~ , η ~ , χ ~ , u ) = S μ ν λ σ ζ α ω α , e μ δ α μ η β ω β , e ν δ β ν χ γ ω γ , e λ δ λ γ u δ ω σ , e δ δ δ δ (4.44) = S μ ν λ σ ζ μ η ν χ λ u σ . S ( ζ ~ , η ~ , χ ~ , u ) = S μ ν λ σ ζ α ω α , e μ δ α μ η β ω β , e ν δ β ν χ γ ω γ , e λ δ λ γ u δ ω σ , e δ δ δ δ (4.44) = S μ ν λ σ ζ μ η ν χ λ u σ . {:[S( tilde(zeta)"," tilde(eta)"," tilde(chi)","u)=S^(mu nu lambda)_(sigma)zeta_(alpha)ubrace((:omega^(alpha),e_(mu):)ubrace)_(delta^(alpha)_(mu))eta_(beta)ubrace((:omega^(beta),e_(nu):)ubrace)_(delta^(beta)_(nu))chi_(gamma)ubrace((:omega^(gamma),e_(lambda):)ubrace)_(delta_(lambda)^(gamma))u^(delta)ubrace((:omega^(sigma),e_(delta):)ubrace)_(delta_(delta)^(delta))],[(4.44)=S^(mu nu lambda)_(sigma)zeta_(mu)eta_(nu)chi_(lambda)u^(sigma).]:}\begin{align*} \boldsymbol{S}(\tilde{\boldsymbol{\zeta}}, \tilde{\boldsymbol{\eta}}, \tilde{\boldsymbol{\chi}}, \boldsymbol{u}) & =S^{\mu \nu \lambda}{ }_{\sigma} \zeta_{\alpha} \underbrace{\left\langle\boldsymbol{\omega}^{\alpha}, \boldsymbol{e}_{\mu}\right\rangle}_{\delta^{\alpha}{ }_{\mu}} \eta_{\beta} \underbrace{\left\langle\boldsymbol{\omega}^{\beta}, \boldsymbol{e}_{\nu}\right\rangle}_{\delta^{\beta}{ }_{\nu}} \chi_{\gamma} \underbrace{\left\langle\boldsymbol{\omega}^{\gamma}, \boldsymbol{e}_{\lambda}\right\rangle}_{\delta_{\lambda}^{\gamma}} u^{\delta} \underbrace{\left\langle\boldsymbol{\omega}^{\sigma}, \boldsymbol{e}_{\delta}\right\rangle}_{\delta_{\delta}^{\delta}} \\ & =S^{\mu \nu \lambda}{ }_{\sigma} \zeta_{\mu} \eta_{\nu} \chi_{\lambda} u^{\sigma} . \tag{4.44} \end{align*}S(ζ~,η~,χ~,u)=Sμνλσζαωα,eμδαμηβωβ,eνδβνχγωγ,eλδλγuδωσ,eδδδδ(4.44)=Sμνλσζμηνχλuσ.
Tensors are independent of coordinate system, but their components depend on the details of the coordinates. How do tensor components transform? We use the tensor transformation law that says that the transformation is carried out by a multiplication of transformation matrices, one for each index. Specifically, every up index μ μ mu\muμ is transformed by a matrix Λ α μ = x α / x μ Λ α μ = x α / x μ Lambda^(alpha^('))_(mu)=delx^(alpha^('))//delx^(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu}=\partial x^{\alpha^{\prime}} / \partial x^{\mu}Λαμ=xα/xμ and every down index σ σ sigma\sigmaσ is transformed by a matrix x σ / x β x σ / x β delx^(sigma)//delx^(beta^('))\partial x^{\sigma} / \partial x^{\beta^{\prime}}xσ/xβ. Our example tensor therefore transforms as
(4.45) S μ ν λ σ = x μ x μ x ν x ν x λ x λ x σ x σ S σ μ ν λ (4.45) S μ ν λ σ = x μ x μ x ν x ν x λ x λ x σ x σ S σ μ ν λ {:(4.45)S^(mu^(')nu^(')lambda^('))_(sigma^('))=(delx^(mu^(')))/(delx^(mu))*(delx^(nu^(')))/(delx^(nu))(delx^(lambda^(')))/(delx^(lambda))(delx^(sigma))/(delx^(sigma^(')))S_(sigma)^(mu nu lambda):}\begin{equation*} S^{\mu^{\prime} \nu^{\prime} \lambda^{\prime}}{ }_{\sigma^{\prime}}=\frac{\partial x^{\mu^{\prime}}}{\partial x^{\mu}} \cdot \frac{\partial x^{\nu^{\prime}}}{\partial x^{\nu}} \frac{\partial x^{\lambda^{\prime}}}{\partial x^{\lambda}} \frac{\partial x^{\sigma}}{\partial x^{\sigma^{\prime}}} S_{\sigma}^{\mu \nu \lambda} \tag{4.45} \end{equation*}(4.45)Sμνλσ=xμxμxνxνxλxλxσxσSσμνλ
In coordinate-based treatments, the tensor transformation law is used to define tensors, but we prefer to use the slot-machine definition which is much cleaner. 17 17 ^(17){ }^{17}17
Example 4.9
The ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) metric tensor is written as
(4.46) η ( , ) = η μ ν ω μ ( ) ω μ ( ) . (4.46) η ( , ) = η μ ν ω μ ( ) ω μ ( ) . {:(4.46)eta(",")=eta_(mu nu)omega^(mu)()oxomega^(mu)().:}\begin{equation*} \boldsymbol{\eta}(,)=\eta_{\mu \nu} \boldsymbol{\omega}^{\mu}() \otimes \boldsymbol{\omega}^{\mu}() . \tag{4.46} \end{equation*}(4.46)η(,)=ημνωμ()ωμ().
This tensor has components that can be extracted: η ( e α , e β ) = η α β η e α , e β = η α β eta(e_(alpha),e_(beta))=eta_(alpha beta)\boldsymbol{\eta}\left(\boldsymbol{e}_{\alpha}, \boldsymbol{e}_{\beta}\right)=\eta_{\alpha \beta}η(eα,eβ)=ηαβ. We can now see explicitly what happens if we insert a vector v v v\boldsymbol{v}v into one of the slots
η ( v , ) = η μ ν ω μ ( v ) ω ν ( ) = η μ ν v σ e σ , ω μ ω ν ( ) = η μ ν v μ ω ν ( ) (4.47) = v ν ω ν ( ) , η ( v , ) = η μ ν ω μ ( v ) ω ν ( ) = η μ ν v σ e σ , ω μ ω ν ( ) = η μ ν v μ ω ν ( ) (4.47) = v ν ω ν ( ) , {:[eta(v",")=eta_(mu nu)omega^(mu)(v)omega^(nu)()],[=eta_(mu nu)v^(sigma)(:e_(sigma),omega^(mu):)omega^(nu)()],[=eta_(mu nu)v^(mu)omega^(nu)()],[(4.47)=v_(nu)omega^(nu)()","]:}\begin{align*} \boldsymbol{\eta}(\boldsymbol{v},) & =\eta_{\mu \nu} \boldsymbol{\omega}^{\mu}(\boldsymbol{v}) \boldsymbol{\omega}^{\nu}() \\ & =\eta_{\mu \nu} v^{\sigma}\left\langle\boldsymbol{e}_{\sigma}, \boldsymbol{\omega}^{\mu}\right\rangle \boldsymbol{\omega}^{\nu}() \\ & =\eta_{\mu \nu} v^{\mu} \boldsymbol{\omega}^{\nu}() \\ & =v_{\nu} \boldsymbol{\omega}^{\nu}(), \tag{4.47} \end{align*}η(v,)=ημνωμ(v)ων()=ημνvσeσ,ωμων()=ημνvμων()(4.47)=vνων(),
where in the penultimate line, we've used the components of the metric tensor to lower an index. We see that the output is, as we predicted, a 1 -form with components v ν v ν v_(nu)v_{\nu}vν.
The tensor above has valence ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2), but we can also define a ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) version
(4.48) η ( , ) = η μ ν e μ e ν (4.48) η ( , ) = η μ ν e μ e ν {:(4.48)eta(",")=eta^(mu nu)e_(mu)oxe_(nu):}\begin{equation*} \boldsymbol{\eta}(,)=\eta^{\mu \nu} \boldsymbol{e}_{\mu} \otimes \boldsymbol{e}_{\nu} \tag{4.48} \end{equation*}(4.48)η(,)=ημνeμeν
where η μ ν η ν σ = δ σ μ η μ ν η ν σ = δ σ μ eta_(mu nu)eta^(nu sigma)=delta^(sigma)_(mu)\eta_{\mu \nu} \eta^{\nu \sigma}=\delta^{\sigma}{ }_{\mu}ημνηνσ=δσμ (implying η μ ν = η μ ν η μ ν = η μ ν eta_(mu nu)=eta^(mu nu)\eta_{\mu \nu}=\eta^{\mu \nu}ημν=ημν ). The ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) version of the tensor inputs two 1 -forms and can map a 1 -form to a vector
Finally, note that we can use the metric tensor on the components of a tensor to raise or lower them, one at a time. So we have, for example, that
(4.49) S μ ν η ν β = S β μ , T μ ν η μ α = T ν α , (4.49) S μ ν η ν β = S β μ , T μ ν η μ α = T ν α , {:(4.49)S^(mu nu)eta_(nu beta)=S_(beta)^(mu)","quadT_(mu nu)eta^(mu alpha)=T_(nu)^(alpha)",":}\begin{equation*} S^{\mu \nu} \eta_{\nu \beta}=S_{\beta}^{\mu}, \quad T_{\mu \nu} \eta^{\mu \alpha}=T_{\nu}^{\alpha}, \tag{4.49} \end{equation*}(4.49)Sμνηνβ=Sβμ,Tμνημα=Tνα,
17 17 ^(17){ }^{17}17 There is an unfortunate tendency for general relativity to become 'death by indices'. Our use of coordinate-free objects, such as S ( , , S ( , , S(,,\boldsymbol{S}(,,S(,,, ) , i s i n t e n d e d t o ) , i s i n t e n d e d t o ),isintendedto) , is intended to),isintendedto avoid this and this way of writing equations is sometimes called index-free notation. Imagine if you had learnt electromagnetism just in terms of coordinates, but never having seen vector notation. Efficient notation can declutter equations and (hopefully) make the physics more transparent.
18 18 ^(18){ }^{18}18 Spoiler alert: the Einstein equation (which we will get to properly at the end of Part II) has the form
( Curvature of spacetime ) = ( Mass-energy density at this point )  Curvature   of   spacetime  =  Mass-energy   density at   this point  ([" Curvature "],[" of "],[" spacetime "])=([" Mass-energy "],[" density at "],[" this point "])\left(\begin{array}{c}\text { Curvature } \\ \text { of } \\ \text { spacetime }\end{array}\right)=\left(\begin{array}{c}\text { Mass-energy } \\ \text { density at } \\ \text { this point }\end{array}\right)( Curvature  of  spacetime )=( Mass-energy  density at  this point )
The right-hand side of this equation will be related to the energy momentum tensor.
19 19 ^(19){ }^{19}19 Astrophysicists like talking about dust, as there's a lot of it about in the dust, as there's a lot of it about in the
Universe. The term refers to solid particles that can be anything from a few molecules up to macroscopic size, and for our purposes we are going to assume that they are just bits of mass, distributed in space, at a low enough density that they don't interact with each other.
and so on.
This section has contained a lot of formalism, but let's finish it with a couple of very simple corollaries that shouldn't be forgotten amidst all the mathematical manipulations.
  • If two tensors A A A\boldsymbol{A}A and B B B\boldsymbol{B}B are equal to each other in one frame, they will be equal to each other in all frames. This is obvious if you think of the tensors in a coordinate-free way. Alternatively, construct the tensor C = A B C = A B C=A-B\boldsymbol{C}=\boldsymbol{A}-\boldsymbol{B}C=AB, which is identically zero, and so all its components are zero. Its components will clearly all be zero if multiplied by any transformation matrix.
  • A scalar [which is a ( 0 , 0 ) ( 0 , 0 ) (0,0)(0,0)(0,0) tensor] takes the same numerical value in all frames. (An example is the Ricci scalar, to be introduced in Chapter 11.) Therefore, if you evaluate a scalar in the most convenient frame, you have got it for all frames.

4.5 Energy-momentum tensor

As a payoff for all of this formalism, we introduce one of the most important tensors in all of physics: the energy-momentum tensor. This tensor gives the (physical) right-hand side of the Einstein equation of general relativity. 18 18 ^(18){ }^{18}18
Let's start off by considering a set of dust particles in spacetime, 19 19 ^(19){ }^{19}19 each of mass m m mmm, and imagine that in some frame S S SSS these particles are all distributed in space but are at rest. Their energy will just be m m mmm per particle (remember that, if we reinstate the factors of c c ccc, this would be m c 2 m c 2 mc^(2)m c^{2}mc2 per particle), and if there are n 0 n 0 n_(0)n_{0}n0 particles per unit volume the energy density will be n 0 m n 0 m n_(0)mn_{0} mn0m. In another inertial frame S S S^(')S^{\prime}S, the energy becomes γ m γ m gamma m\gamma mγm per particle [acquiring a factor of γ γ gamma\gammaγ because the particles are now moving with speed v v vvv, and γ = ( 1 v 2 ) 1 / 2 ] γ = 1 v 2 1 / 2 {: gamma=(1-v^(2))^(-1//2)]\left.\gamma=\left(1-v^{2}\right)^{-1 / 2}\right]γ=(1v2)1/2] and the energy density becomes γ 2 n 0 m γ 2 n 0 m gamma^(2)n_(0)m\gamma^{2} n_{0} mγ2n0m (acquiring a second factor of γ γ gamma\gammaγ because the region containing the particles in S S SSS will have become Lorentz contracted in S S S^(')S^{\prime}S by a factor of γ γ gamma\gammaγ, increasing the density). Energy density therefore transforms with two factors of γ γ gamma\gammaγ and this indicates that it is part of a second-rank tensor.
To understand what this second-rank tensor could be, let's take a step back and think about particle current. Recall that the particle current can be expressed as a 4 -vector J = n 0 u J = n 0 u J=n_(0)u\boldsymbol{J}=n_{0} \boldsymbol{u}J=n0u, where here n 0 n 0 n_(0)n_{0}n0 is the density of dust particles in their rest frame and u u u\boldsymbol{u}u is the 4 -velocity of the assembly of dust particles. The time-component of this current tells us about the number density n = γ n 0 n = γ n 0 n=gamman_(0)n=\gamma n_{0}n=γn0 of the particles [remember from Chapter 2 that u = γ ( 1 , v ) u = γ ( 1 , v ) u=gamma(1, vec(v))\boldsymbol{u}=\gamma(1, \vec{v})u=γ(1,v) so J = γ n 0 ( 1 , v ) ] J = γ n 0 ( 1 , v ) {:J=gamman_(0)(1,( vec(v)))]\left.\boldsymbol{J}=\gamma n_{0}(1, \vec{v})\right]J=γn0(1,v)] and each spatial component of this current tells us the flux of particles along that direction (e.g. in Cartesian coordinates, J x J x J^(x)J^{x}Jx tells us about the number of particles crossing the y z y z yzy zyz plane, per unit area, per unit time).
This is all useful for thinking about the flux of particles, but what if we want to understand the flux of 4 -momentum? That's a really interesting question because we would like to know how energy and momentum are transported across spacetime. The problem is that, unlike the number
of particles, which is a scalar, the 4 -momentum is a 4 -vector, and so its flux has to be a more complicated object than a 4 -vector. This confirms that the object needed to describe the flux of momentum will need to be a 20 a 20 a^(20)\mathrm{a}^{20}a20 second-rank tensor since it depends on the 4-momentum and the 4-current. We call this object the energy-momentum tensor 21 T ( 21 T ( ^(21)T({ }^{21} \boldsymbol{T}(21T(, ) , ) , ),) ,), and it has two slots (or in components, it will be a second-rank tensor). We will define it (for now) as the symmetric tensor
(4.51) T ( , ) = J ~ ( ) p ~ ( ) , (4.51) T ( , ) = J ~ ( ) p ~ ( ) , {:(4.51)T(",")= tilde(J)()ox tilde(p)()",":}\begin{equation*} \boldsymbol{T}(,)=\tilde{\boldsymbol{J}}() \otimes \tilde{\boldsymbol{p}}(), \tag{4.51} \end{equation*}(4.51)T(,)=J~()p~(),
and from this it's readily apparent that T ( T ( T(\boldsymbol{T}(T(, ) i s a ( 0 , 2 ) ) i s a ( 0 , 2 ) )isa(0,2)) is a (0,2))isa(0,2) object that inputs two vectors and has components T μ ν = T ( e μ , e ν ) = J μ ρ ν T μ ν = T e μ , e ν = J μ ρ ν T_(mu nu)=T(e_(mu),e_(nu))=J_(mu)rho_(nu)T_{\mu \nu}=\boldsymbol{T}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right)=J_{\mu} \rho_{\nu}Tμν=T(eμ,eν)=Jμρν. We can, of course, rewrite this tensor in other ways and we could define it as a symmetric ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) object, with upstairs indices on its components T μ ν . 22 T μ ν . 22 T^(mu nu).^(22)T^{\mu \nu} .{ }^{22}Tμν.22

Example 4.10

To see what T T T\boldsymbol{T}T looks like in practice, let's stick with components to begin with and evaluate everything in a frame in which the number density J 0 n = γ n 0 J 0 n = γ n 0 J^(0)-=n=gamman_(0)J^{0} \equiv n=\gamma n_{0}J0n=γn0. The ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) version of the energy-momentum tensor for the cloud of particles can then be written as 23 T μ ν = J μ p ν = ( n 0 u μ ) ( m u ν ) 23 T μ ν = J μ p ν = n 0 u μ m u ν ^(23)T^(mu nu)=J^(mu)p^(nu)=(n_(0)u^(mu))(mu^(nu)){ }^{23} T^{\mu \nu}=J^{\mu} p^{\nu}=\left(n_{0} u^{\mu}\right)\left(m u^{\nu}\right)23Tμν=Jμpν=(n0uμ)(muν), where u u u\boldsymbol{u}u is the velocity of the cloud with components u μ u μ u^(mu)u^{\mu}uμ.
  • The time-time element T 00 T 00 T^(00)T^{00}T00 is then just the energy p 0 = γ m p 0 = γ m p^(0)=gamma mp^{0}=\gamma mp0=γm multiplied by J 0 = γ n 0 = n J 0 = γ n 0 = n J^(0)=gamman_(0)=nJ^{0}=\gamma n_{0}=nJ0=γn0=n, and hence T 00 = γ n m T 00 = γ n m T^(00)=gamma nmT^{00}=\gamma n mT00=γnm is equal to the energy density.
  • The space-time and time-space elements T i 0 T i 0 T^(i0)T^{i 0}Ti0 and T 0 i T 0 i T^(0i)T^{0 i}T0i are n γ m v i n γ m v i n gamma mv^(i)n \gamma m v^{i}nγmvi and hence correspond to the density of the i i iii th component of the momentum.
  • The space-space elements T i j T i j T^(ij)T^{i j}Tij are n γ m v i v j n γ m v i v j n gamma mv^(i)v^(j)n \gamma m v^{i} v^{j}nγmvivj and are momentum fluxes which, as discussed below, correspond to stresses.
    Another way of looking at the components is to say that the energy-momentum tensor T μ ν T μ ν T^(mu nu)T^{\mu \nu}Tμν tells us the flux of the 4-momentum p μ p μ p^(mu)p^{\mu}pμ that crosses a surface of constant x ν x ν x^(nu)x^{\nu}xν. In particular, this means that
  • T 00 T 00 T^(00)T^{00}T00 is the energy density, since it is the flux of p 0 p 0 p^(0)p^{0}p0 (energy) crossing a surface of constant time (i.e. filling space).
  • T 0 i = T i 0 T 0 i = T i 0 T^(0i)=T^(i0)T^{0 i}=T^{i 0}T0i=Ti0 is the mass flux across a surface of constant x i x i x^(i)x^{i}xi, which is equivalent to the density of the i i iii th component of linear momentum.
  • T i j T i j T^(ij)T^{i j}Tij is the i j i j iji jij component of the usual stress tensor, meaning that the off diagonal terms are shear stresses and the diagonal terms ( T i i T i i T^(ii)T^{i i}Tii ) correspond to pressures.
    Two very simple examples of this tensor are as follows:
    (1) A set of dust particles at rest. These only have energy density, and are not moving and so have no linear momentum. Hence, in their rest frame, we have
(4.53) T μ ν = T μ ν = ( ρ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ) (4.53) T μ ν = T μ ν = ρ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 {:(4.53)T^(mu nu)=T_(mu nu)=([rho,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]):}T^{\mu \nu}=T_{\mu \nu}=\left(\begin{array}{cccc} \rho & 0 & 0 & 0 \tag{4.53}\\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right)(4.53)Tμν=Tμν=(ρ000000000000000)
(2) An isotropic fluid in equilibrium. The particles in the fluid exert 24 24 ^(24){ }^{24}24 a pressure p p ppp, but have no preferred direction (meaning that T i 0 = 0 T i 0 = 0 T^(i0)=0T^{i 0}=0Ti0=0 and T i j T i j T^(ij)T^{i j}Tij has no off-diagonal components). Hence, in the rest frame of the fluid, we have
(4.54) T μ ν = T μ ν = ( ρ 0 0 0 0 p 0 0 0 0 p 0 0 0 0 p ) (4.54) T μ ν = T μ ν = ρ 0 0 0 0 p 0 0 0 0 p 0 0 0 0 p {:(4.54)T^(mu nu)=T_(mu nu)=([rho,0,0,0],[0,p,0,0],[0,0,p,0],[0,0,0,p]):}T^{\mu \nu}=T_{\mu \nu}=\left(\begin{array}{cccc} \rho & 0 & 0 & 0 \tag{4.54}\\ 0 & p & 0 & 0 \\ 0 & 0 & p & 0 \\ 0 & 0 & 0 & p \end{array}\right)(4.54)Tμν=Tμν=(ρ0000p0000p0000p)
We will return to this problem in much more detail in Chapter 12.
20 20 ^(20){ }^{20}20 In general, a second-rank tensor has ( m , n ) = ( 2 , 0 ) ( m , n ) = ( 2 , 0 ) (m,n)=(2,0)(m, n)=(2,0)(m,n)=(2,0) or ( m , n ) = ( 0 , 2 ) ( m , n ) = ( 0 , 2 ) (m,n)=(0,2)(m, n)=(0,2)(m,n)=(0,2).
21 21 ^(21){ }^{21}21 This is also known as the stressenergy tensor.
22 22 ^(22){ }^{22}22 That is, we transform
(4.52) T μ ν = η α μ η β ν T α β (4.52) T μ ν = η α μ η β ν T α β {:(4.52)T^(mu nu)=eta^(alpha mu)eta^(beta nu)T_(alpha beta):}\begin{equation*} T^{\mu \nu}=\eta^{\alpha \mu} \eta^{\beta \nu} T_{\alpha \beta} \tag{4.52} \end{equation*}(4.52)Tμν=ηαμηβνTαβ
Since η μ ν = η μ ν η μ ν = η μ ν eta_(mu nu)=eta^(mu nu)\eta_{\mu \nu}=\eta^{\mu \nu}ημν=ημν is a diagonal tensor with components diag ( 1 , 1 , 1 , 1 ) diag ( 1 , 1 , 1 , 1 ) diag(-1,1,1,1)\operatorname{diag}(-1,1,1,1)diag(1,1,1,1), we see that raising or lowering a timelike component earns us a minus sign. This allows us to see immediately that T 00 = T 00 , T i i = T i i , T 0 i = T 0 i T 00 = T 00 , T i i = T i i , T 0 i = T 0 i T^(00)=T_(00),T^(ii)=T_(ii),T^(0i)=-T_(0i)T^{00}=T_{00}, T^{i i}=T_{i i}, T^{0 i}=-T_{0 i}T00=T00,Tii=Tii,T0i=T0i and T i 0 = T i 0 T i 0 = T i 0 T^(i0)=-T_(i0)T^{i 0}=-T_{i 0}Ti0=Ti0. This is a good example of a tensor for which the components can change when you move the indices from upstairs to downstairs (in contrast to η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν, for which they do not, as explained on page 45).
23 23 ^(23){ }^{23}23 Reminder: We use u μ = ( γ , γ v i ) u μ = γ , γ v i u^(mu)=(gamma,gammav^(i))u^{\mu}=\left(\gamma, \gamma v^{i}\right)uμ=(γ,γvi), J μ = n 0 u μ = ( n , n v i ) J μ = n 0 u μ = n , n v i J^(mu)=n_(0)u^(mu)=(n,nv^(i))J^{\mu}=n_{0} u^{\mu}=\left(n, n v^{i}\right)Jμ=n0uμ=(n,nvi) and p μ = p μ = p^(mu)=p^{\mu}=pμ= m u μ = ( γ m , γ m v i ) m u μ = γ m , γ m v i mu^(mu)=(gamma m,gamma mv^(i))m u^{\mu}=\left(\gamma m, \gamma m v^{i}\right)muμ=(γm,γmvi).
24 24 ^(24){ }^{24}24 Do not confuse pressure p p ppp with momentum. The context should always make it clear, but it's a shame the two quantities have the same symbol.
25 25 ^(25){ }^{25}25 We need our previous results that for an observer with velocity u , p ~ ( u ) = u , p ~ ( u ) = u, tilde(p)(u)=\boldsymbol{u}, \tilde{\boldsymbol{p}}(\boldsymbol{u})=u,p~(u)= E E -E-EE and J ~ ( u ) = n J ~ ( u ) = n tilde(J)(u)=-n\tilde{\boldsymbol{J}}(\boldsymbol{u})=-nJ~(u)=n.
26 26 ^(26){ }^{26}26 In components, eqn 4.58 is T μ ν u μ u ν = n E T μ ν u μ u ν = n E T_(mu nu)u^(mu)u^(nu)=nET_{\mu \nu} u^{\mu} u^{\nu}=n ETμνuμuν=nE.
27 27 ^(27){ }^{27}27 This is guaranteed because the tensor is symmetric, i.e. T ( u , a ) = T ( u , a ) = T(u,a)=\boldsymbol{T}(\boldsymbol{u}, \boldsymbol{a})=T(u,a)= T ( a , u ) T ( a , u ) T(a,u)\boldsymbol{T}(\boldsymbol{a}, \boldsymbol{u})T(a,u). In components, one could write eqns 4.59 and 4.60 as T μ ν u μ a ν = T μ ν u μ a ν = T_(mu nu)u^(mu)a^(nu)=T_{\mu \nu} u^{\mu} a^{\nu}=Tμνuμaν= n p μ a μ n p μ a μ -np_(mu)a^(mu)-n p_{\mu} a^{\mu}npμaμ and T μ ν a μ u ν = E J μ a μ T μ ν a μ u ν = E J μ a μ T_(mu nu)a^(mu)u^(nu)=-EJ_(mu)a^(mu)T_{\mu \nu} a^{\mu} u^{\nu}=-E J_{\mu} a^{\mu}Tμνaμuν=EJμaμ. These quantities are equal because, recalling that the observer is travelling with velocity u u u\boldsymbol{u}u, we have J = n 0 v = J = n 0 v = J=n_(0)v=\boldsymbol{J}=n_{0} \boldsymbol{v}=J=n0v= n p / γ ( u ) m n p / γ ( u ) m np//gamma(u)mn \boldsymbol{p} / \gamma(u) mnp/γ(u)m and E = γ ( u ) m E = γ ( u ) m E=gamma(u)mE=\gamma(u) mE=γ(u)m, where u u uuu is the speed of the observer relative to the measurement frame.
28 28 ^(28){ }^{28}28 In components, T μ ν u μ = n p ν T μ ν u μ = n p ν T_(mu nu)u^(mu)=-np_(nu)T_{\mu \nu} u^{\mu}=-n p_{\nu}Tμνuμ=npν
Finally, let's use the elegant formalism of our slot machines to consider the energy-momentum tensor as a machine with two slots in it. This will allow us to read off the properties of our set of dust particles in the frame of an observer travelling with velocity vector u u u\boldsymbol{u}u. Our dust particles are described with a momentum 1-form p ~ ( ) p ~ ( ) tilde(p)()\tilde{\boldsymbol{p}}()p~() and so, following the results proved in Example 4.5, if we insert a velocity vector u u u\boldsymbol{u}u into p ~ ( ) p ~ ( ) tilde(p)()\tilde{\boldsymbol{p}}()p~() then we will output (with a minus sign) the energy of the particle E E EEE, as measured by an observer (with velocity vector u u u\boldsymbol{u}u )
(4.55) E = p ~ ( u ) (4.55) E = p ~ ( u ) {:(4.55)E=- tilde(p)(u):}\begin{equation*} E=-\tilde{\boldsymbol{p}}(\boldsymbol{u}) \tag{4.55} \end{equation*}(4.55)E=p~(u)
The particle current 1 -form is J ~ ( ) J ~ ( ) tilde(J)()\tilde{\boldsymbol{J}}()J~(). Insert a velocity u u u\boldsymbol{u}u and, with a minus sign, we output the number density of particles measured by the observer with velocity u u u\boldsymbol{u}u
(4.56) n = J ~ ( u ) (4.56) n = J ~ ( u ) {:(4.56)n=- tilde(J)(u):}\begin{equation*} n=-\tilde{\boldsymbol{J}}(\boldsymbol{u}) \tag{4.56} \end{equation*}(4.56)n=J~(u)
Example 4.11
A swarm of massive particles has a particle current J = n 0 v J = n 0 v J=n_(0)v\boldsymbol{J}=n_{0} \boldsymbol{v}J=n0v, where n 0 n 0 n_(0)n_{0}n0 is the number density of particles in the rest frame of the swarm, each particle has rest mass m m mmm, velocity v v v\boldsymbol{v}v and momentum p = m v p = m v p=mv\boldsymbol{p}=\boldsymbol{m} \boldsymbol{v}p=mv. The density of particles in the swarm's rest frame is ρ 0 = m n 0 ρ 0 = m n 0 rho_(0)=mn_(0)\rho_{0}=m n_{0}ρ0=mn0. The ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) energy-momentum tensor in this case is given by
(4.57) T ( , ) = J ~ ( ) p ~ ( ) = ρ 0 v ~ ( ) v ~ ( ) (4.57) T ( , ) = J ~ ( ) p ~ ( ) = ρ 0 v ~ ( ) v ~ ( ) {:(4.57)T(",")= tilde(J)()ox tilde(p)()=rho_(0) tilde(v)()ox tilde(v)():}\begin{equation*} \boldsymbol{T}(,)=\tilde{\boldsymbol{J}}() \otimes \tilde{\boldsymbol{p}}()=\rho_{0} \tilde{\boldsymbol{v}}() \otimes \tilde{\boldsymbol{v}}() \tag{4.57} \end{equation*}(4.57)T(,)=J~()p~()=ρ0v~()v~()
Here v ~ = v μ ω μ v ~ = v μ ω μ tilde(v)=v_(mu)omega^(mu)\tilde{\boldsymbol{v}}=v_{\mu} \boldsymbol{\omega}^{\mu}v~=vμωμ are velocity 1 -forms for the fluid. This energy-momentum tensor has components T μ ν = ρ 0 v μ v ν T μ ν = ρ 0 v μ v ν T_(mu nu)=rho_(0)v_(mu)v_(nu)T_{\mu \nu}=\rho_{0} v_{\mu} v_{\nu}Tμν=ρ0vμvν. We can insert some vectors into the slots in order to understand the physical meaning of the components of T . 25 T . 25 T.^(25)\boldsymbol{T} .{ }^{25}T.25
(i) We start by inserting the observer's velocity vector u u u\boldsymbol{u}u in both slots
T ( u , u ) = J ~ ( u ) p ~ ( u ) T ( u , u ) = J ~ ( u ) p ~ ( u ) T(u,u)= tilde(J)(u)ox tilde(p)(u)\boldsymbol{T}(\boldsymbol{u}, \boldsymbol{u})=\tilde{\boldsymbol{J}}(\boldsymbol{u}) \otimes \tilde{\boldsymbol{p}}(\boldsymbol{u})T(u,u)=J~(u)p~(u)
(4.58) = n E (4.58) = n E {:(4.58)=nE:}\begin{equation*} =n E \tag{4.58} \end{equation*}(4.58)=nE
the output 26 26 ^(26){ }^{26}26 is the energy density in the observer's rest frame.
(ii) Now enter a dimensionless vector a a a\boldsymbol{a}a and the velocity to find
T ( u , a ) = J ~ ( u ) p ~ ( a ) (4.59) = n p ~ ( a ) , T ( u , a ) = J ~ ( u ) p ~ ( a ) (4.59) = n p ~ ( a ) , {:[T(u","a)= tilde(J)(u)ox tilde(p)(a)],[(4.59)=-n tilde(p)(a)","]:}\begin{align*} \boldsymbol{T}(\boldsymbol{u}, \boldsymbol{a}) & =\tilde{\boldsymbol{J}}(\boldsymbol{u}) \otimes \tilde{\boldsymbol{p}}(\boldsymbol{a}) \\ & =-n \tilde{\boldsymbol{p}}(\boldsymbol{a}), \tag{4.59} \end{align*}T(u,a)=J~(u)p~(a)(4.59)=np~(a),
which is (minus) the momentum density pointing along vector a a a\boldsymbol{a}a, as measured by the observer.
(iii) Putting the vectors in the other way round, we find
T ( a , u ) = J ~ ( a ) p ~ ( u ) (4.60) = E J ~ ( a ) T ( a , u ) = J ~ ( a ) p ~ ( u ) (4.60) = E J ~ ( a ) {:[T(a","u)= tilde(J)(a)ox tilde(p)(u)],[(4.60)=-E tilde(J)(a)]:}\begin{align*} \boldsymbol{T}(\boldsymbol{a}, \boldsymbol{u}) & =\tilde{\boldsymbol{J}}(\boldsymbol{a}) \otimes \tilde{\boldsymbol{p}}(\boldsymbol{u}) \\ & =-E \tilde{\boldsymbol{J}}(\boldsymbol{a}) \tag{4.60} \end{align*}T(a,u)=J~(a)p~(u)(4.60)=EJ~(a)
which is (minus) the energy transported along the direction a a a\boldsymbol{a}a, according to the observer. This expression is equal to the particle momentum density transported along a a a\boldsymbol{a}a (in eqn 4.59). 27 27 ^(27){ }^{27}27
(iv) Now try entering a single velocity vector into one slot
(4.61) T ( u , ) = n p ~ ( ) , (4.61) T ( u , ) = n p ~ ( ) , {:(4.61)T(u",")=-n tilde(p)()",":}\begin{equation*} \boldsymbol{T}(\boldsymbol{u},)=-n \tilde{\boldsymbol{p}}(), \tag{4.61} \end{equation*}(4.61)T(u,)=np~(),
which 28 28 ^(28){ }^{28}28 gives the 4 -momentum density 1 -form in the rest frame of the observer.

Chapter summary

  • 1-forms can be viewed as a set of equally spaced planes. They combine with vectors to form numbers
  • Tensors are linear slot machines that input vectors and 1-forms and output numbers.
  • The energy-momentum tensor T T T\boldsymbol{T}T is a ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) [or ( 2 , 0 ) ] ( 2 , 0 ) ] (2,0)](2,0)](2,0)] symmetric tensor with components T μ ν T μ ν T_(mu nu)T_{\mu \nu}Tμν [or T μ ν T μ ν T^(mu nu)T^{\mu \nu}Tμν ]. It tells us about the flux of the 4 -momentum, and the time-time component T 00 T 00 T^(00)T^{00}T00 gives us access to the energy density.

Exercises

(4.1) A tensor W W W\boldsymbol{W}W has components W α β = u α v β W α β = u α v β W^(alpha beta)=u^(alpha)v^(beta)W^{\alpha \beta}=u^{\alpha} v^{\beta}Wαβ=uαvβ where u u u\boldsymbol{u}u and v v v\boldsymbol{v}v are 4 -vectors. Show that W W W\boldsymbol{W}W transforms properly as a tensor.
(4.2) Show that δ α β δ α β delta^(alpha)_(beta)\delta^{\alpha}{ }_{\beta}δαβ transforms properly as a tensor. What about δ α β δ α β delta_(alpha beta)\delta_{\alpha \beta}δαβ and δ α β δ α β delta^(alpha beta)\delta^{\alpha \beta}δαβ ?
(4.3) Show that if you take a tensor S ( S ( S(\boldsymbol{S}(S(, , with components S ν μ S ν μ S_(nu)^(mu)S_{\nu}^{\mu}Sνμ you can construct a Lorentz invariant scalar by evaluating S μ μ S μ μ S^(mu)_(mu)S^{\mu}{ }_{\mu}Sμμ. (Remember that using the summation convention, this involves evaluating μ = 0 4 S μ μ . ) μ = 0 4 S μ μ . {:sum_(mu=0)^(4)S^(mu)_(mu).)\left.\sum_{\mu=0}^{4} S^{\mu}{ }_{\mu}.\right)μ=04Sμμ.)
(4.4) Consider flat space in spherical polar coordinates. (a) Compute the basis 1-forms ω r , ω θ ω r , ω θ omega^(r),omega^(theta)\boldsymbol{\omega}^{r}, \boldsymbol{\omega}^{\theta}ωr,ωθ and ω ϕ ω ϕ omega^(phi)\boldsymbol{\omega}^{\phi}ωϕ in terms of the Cartesian basis 1-forms ω x , ω y ω x , ω y omega^(x),omega^(y)\boldsymbol{\omega}^{x}, \boldsymbol{\omega}^{y}ωx,ωy and ω z ω z omega^(z)\omega^{z}ωz.
(b) Write basis vectors e r , e θ e r , e θ e_(r),e_(theta)\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta}er,eθ and e ϕ e ϕ e_(phi)\boldsymbol{e}_{\phi}eϕ in terms of the usual Cartesian basis e x , e y e x , e y e_(x),e_(y)\boldsymbol{e}_{x}, \boldsymbol{e}_{y}ex,ey and e z e z e_(z)\boldsymbol{e}_{z}ez.
(c) Using these results, show that ω μ , e ν = 0 ω μ , e ν = 0 (:omega^(mu),e_(nu):)=0\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\nu}\right\rangle=0ωμ,eν=0 for μ ν μ ν mu!=nu\mu \neq \nuμν, where the indices are r , θ r , θ r,thetar, \thetar,θ and ϕ ϕ phi\phiϕ.
(4.5) Consider a coordinate system ( u , v , w ) ( u , v , w ) (u,v,w)(u, v, w)(u,v,w) related to Cartesian coordinates via
x = u v , y = u + v , (4.62) z = 2 u v + w . x = u v , y = u + v , (4.62) z = 2 u v + w . {:[x=u-v","],[y=u+v","],[(4.62)z=-2uv+w.]:}\begin{align*} & x=u-v, \\ & y=u+v, \\ & z=-2 u v+w . \tag{4.62} \end{align*}x=uv,y=u+v,(4.62)z=2uv+w.
Compute (a) basis vectors and (b) basis 1-forms, in terms of the Cartesian e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ and ω μ ω μ omega^(mu)\boldsymbol{\omega}^{\mu}ωμ.
(4.6) Consider the invariant
(4.63) M = a a = η μ ν a μ a ν (4.63) M = a a = η μ ν a μ a ν {:(4.63)M=a*a=eta_(mu nu)a^(mu)a^(nu):}\begin{equation*} M=\boldsymbol{a} \cdot \boldsymbol{a}=\eta_{\mu \nu} a^{\mu} a^{\nu} \tag{4.63} \end{equation*}(4.63)M=aa=ημνaμaν
where η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν are the components of the metric and a μ a μ a^(mu)a^{\mu}aμ are the components of the vector a a a\boldsymbol{a}a.
(a) Explain why
(4.64) M a μ = 2 η μ ν a ν (4.64) M a μ = 2 η μ ν a ν {:(4.64)(del M)/(dela^(mu))=2eta_(mu nu)a^(nu):}\begin{equation*} \frac{\partial M}{\partial a^{\mu}}=2 \eta_{\mu \nu} a^{\nu} \tag{4.64} \end{equation*}(4.64)Maμ=2ημνaν
(b) Show that
(4.65) a μ M a μ = 2 M (4.65) a μ M a μ = 2 M {:(4.65)a^(mu)(del M)/(dela^(mu))=2M:}\begin{equation*} a^{\mu} \frac{\partial M}{\partial a^{\mu}}=2 M \tag{4.65} \end{equation*}(4.65)aμMaμ=2M
(4.7) Prove the useful relation that, for a vector A A A\boldsymbol{A}A,
(4.66) d d τ A A = 2 η μ ν A μ d A ν d τ (4.66) d d τ A A = 2 η μ ν A μ d A ν d τ {:(4.66)(d)/((d)tau)A*A=2eta_(mu nu)A^(mu)(dA^(nu))/(dtau):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \tau} \boldsymbol{A} \cdot \boldsymbol{A}=2 \eta_{\mu \nu} A^{\mu} \frac{\mathrm{d} A^{\nu}}{\mathrm{d} \tau} \tag{4.66} \end{equation*}(4.66)d dτAA=2ημνAμdAνdτ
(4.8) The Doppler effect in special relativity. By considering the Lorentz-transformation properties of the wavevector 4-vector k k k\boldsymbol{k}k with components k μ = ( ω , k ) k μ = ( ω , k ) k^(mu)=(omega, vec(k))k^{\mu}=(\omega, \vec{k})kμ=(ω,k) derive the expression for the Doppler effect on the frequency of the wave, as predicted by special relativity.

5

5.1 Metrics in general
5.2 Meet some metrics

5.3 Light and light cones

Chapter summary
Exercises

The metric

1 1 ^(1){ }^{1}1 There's scope for confusion here as a metric is an object into which we inmetric is an object into which we in-
put two vectors. The order here is that we input a position in spacetime into the metric field and output a metric for that point in spacetime. We can then insert two vectors into that metric tensor and find their scalar product at that point in spacetime.
2 2 ^(2){ }^{2}2 It will turn out that the metric also features on the right-hand side of this equation too, with the result that the equation is difficult to solve.
...with the measure you use, it will be measured to you (Matthew 7 2 7 2 7^(2)7^{2}72 )
In the previous chapter, we learned that a tensor is a machine for turning vectors and 1-forms into scalars. We have also seen that the metric tensor η ( η ( eta(\boldsymbol{\eta}(η(, ) i s a ( 0 , 2 ) ) i s a ( 0 , 2 ) )isa(0,2)) is a (0,2))isa(0,2) tensor (meaning that you feed it two vectors and it spits out a number) that tells you how to obtain the scalar product of two vectors ( X Y = η μ ν X μ Y ν X Y = η μ ν X μ Y ν (X*Y=eta_(mu nu)X^(mu)Y^(nu):}\left(\boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu}\right.(XY=ημνXμYν, from eqn 4.1). However, everything so far has been for flat Minkowski spacetime (i.e. for special, not general, relativity). It's time now to tackle gravity and when we include that, spacetime becomes curved. We are working up to Einstein's theory of gravitation that can be succinctly stated as
(5.1) ( Curvature of spacetime ) = ( Energy density of matter in spacetime ) (5.1) (  Curvature of   spacetime  ) = (  Energy density of   matter in spacetime  ) {:(5.1)((" Curvature of ")/(" spacetime "))=((" Energy density of ")/(" matter in spacetime ")):}\begin{equation*} \binom{\text { Curvature of }}{\text { spacetime }}=\binom{\text { Energy density of }}{\text { matter in spacetime }} \tag{5.1} \end{equation*}(5.1)( Curvature of  spacetime )=( Energy density of  matter in spacetime )
With curvature included, we need a more general metric and we shall use the symbol g ( g ( g(\boldsymbol{g}(g(, ) f o r o u r g e n e r a l ( 0 , 2 ) ) f o r o u r g e n e r a l ( 0 , 2 ) )forourgeneral(0,2)) for our general (0,2))forourgeneral(0,2) metric tensor [retaining η ( η ( eta(\boldsymbol{\eta}(η(, ) f o r ) f o r )for) for)for the special case of flat Minkowski spacetime]. The metric tensor g ( g ( g(\boldsymbol{g}(g(, will feed into the left-hand side of eqn 5.1 . This chapter will focus on the form of g ( g ( g(\boldsymbol{g}(g(, ) f o r s e v e r a l d i f f e r e n t t y p e s o f s p a c e t i m e . ) f o r s e v e r a l d i f f e r e n t t y p e s o f s p a c e t i m e . )forseveraldifferenttypesofspacetime.) for several different types of spacetime.)forseveraldifferenttypesofspacetime.
In flat spacetime, g ( ) = , η ( g ( ) = , η ( g()=,eta(\boldsymbol{g}()=,\boldsymbol{\eta}(g()=,η(, ) e v e r y w h e r e t h r o u g h o u t s p a c e t i m e . ) e v e r y w h e r e t h r o u g h o u t s p a c e t i m e . )everywherethroughoutspacetime.) everywhere throughout spacetime.)everywherethroughoutspacetime. In curved spacetime, g ( g ( g(\boldsymbol{g}(g(, ) v a r i e s f r o m p o i n t t o p o i n t . T h i s m e a n s t h a t ) v a r i e s f r o m p o i n t t o p o i n t . T h i s m e a n s t h a t )variesfrompointtopoint.Thismeansthat) varies from point to point. This means that)variesfrompointtopoint.Thismeansthat the metric is a field. A field is a quantity where we input a position in spacetime and output a tensor valid for that position in spacetime. For the metric field, we input a position in spacetime and output the appropriate metric tensor that allows us to take dot products at that point in space. 1 1 ^(1){ }^{1}1 Of course the tensors at closely spaced points in spacetime will be related and this relationship gives rise to a field theory: a theory that allows us to describe and predict changes in the fields as a function of space and time and also to examine the consequences the field has on the physics. General relativity is the field theory of gravity. The left-hand side of its governing equation is based on the metric field: the field describing the geometry of spacetime. 2 2 ^(2){ }^{2}2 We begin by restating some simple general facts about metric tensors and their components.

5.1 Metrics in general

For a general space, the metric at a point in spacetime g ( g ( g(\boldsymbol{g}(g(, ) i s a ( 0 , 2 ) ) i s a ( 0 , 2 ) )isa(0,2)) is a (0,2))isa(0,2) slot machine that takes two vectors as input and outputs their scalar
product. 3 3 ^(3){ }^{3}3 Inserting vectors X X X\boldsymbol{X}X and Y Y Y\boldsymbol{Y}Y into the metric, we obtain
(5.2) g ( X , Y ) = X Y (5.2) g ( X , Y ) = X Y {:(5.2)g(X","Y)=X*Y:}\begin{equation*} \boldsymbol{g}(\boldsymbol{X}, \boldsymbol{Y})=\boldsymbol{X} \cdot \boldsymbol{Y} \tag{5.2} \end{equation*}(5.2)g(X,Y)=XY
If we prefer to work in components, then these can be extracted by inputting the basis vectors into the metric
(5.3) g ( e μ , e ν ) = e μ e ν g μ ν (5.3) g e μ , e ν = e μ e ν g μ ν {:(5.3)g(e_(mu),e_(nu))=e_(mu)*e_(nu)-=g_(mu nu):}\begin{equation*} \boldsymbol{g}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right)=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu} \equiv g_{\mu \nu} \tag{5.3} \end{equation*}(5.3)g(eμ,eν)=eμeνgμν
The metric encodes distances in spacetime via an expression in terms of infinitesimal intervals between coordinates. Previously, we wrote the invariant line element in terms of the Minkowski metric as d s 2 = d s 2 = ds^(2)=\mathrm{d} s^{2}=ds2= η μ ν d x μ d x ν η μ ν d x μ d x ν eta_(mu nu)dx^(mu)dx^(nu)\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ημνdxμdxν. This hints at a simple way to write down the invariant infinitesimal length of a line element in any space, so we adopt it and write 4 4 ^(4){ }^{4}4
d s 2 = d x d x = g ( d x , d x ) = d x μ d x ν g ( e μ , e ν ) (5.4) = g μ ν d x μ d x ν d s 2 = d x d x = g ( d x , d x ) = d x μ d x ν g e μ , e ν (5.4) = g μ ν d x μ d x ν {:[ds^(2)=dx*dx=g(dx","dx)=dx^(mu)dx^(nu)g(e_(mu),e_(nu))],[(5.4)=g_(mu nu)dx^(mu)dx^(nu)]:}\begin{align*} \mathrm{d} s^{2} & =\mathrm{d} \boldsymbol{x} \cdot \mathrm{~d} \boldsymbol{x}=\boldsymbol{g}(\mathrm{d} \boldsymbol{x}, \mathrm{~d} \boldsymbol{x})=\mathrm{d} x^{\mu} \mathrm{d} x^{\nu} \boldsymbol{g}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right) \\ & =g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu} \tag{5.4} \end{align*}ds2=dx dx=g(dx, dx)=dxμdxνg(eμ,eν)(5.4)=gμνdxμdxν
This equation is a simple statement of how long an infinitesimal interval is in a particular spacetime geometry, specified by the components of the metric. It's often useful to write down this line-element equation and, since it contains all of the components of the metric g μ ν g μ ν g_(mu nu)g_{\mu \nu}gμν, we often say that the line element is the metric. We can integrate d s d s ds\mathrm{d} sds to work out the total interval s s sss between two events. For example, the magnitude of the interval along a curve between events at points A A A\mathcal{A}A and B B B\mathcal{B}B can be worked out using the metric via the prescription
(5.5) s = A B | g μ ν d x μ d x ν | 1 2 (5.5) s = A B g μ ν d x μ d x ν 1 2 {:(5.5)s=int_(A)^(B)|g_(mu nu)dx^(mu)dx^(nu)|^((1)/(2)):}\begin{equation*} s=\int_{\mathcal{A}}^{\mathcal{B}}\left|g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right|^{\frac{1}{2}} \tag{5.5} \end{equation*}(5.5)s=AB|gμνdxμdxν|12
Example 5.1
Suppose we have two different coordinate systems. We can use the invariance of the line element to relate the components of the metric together. The line element is written as
(5.6) d s 2 = g μ ν d x μ d x ν = g α β d x α d x β (5.6) d s 2 = g μ ν d x μ d x ν = g α β d x α d x β {:(5.6)ds^(2)=g_(mu nu)dx^(mu)dx^(nu)=g_(alpha beta)^(')dx^(alpha^('))dx^(beta^(')):}\begin{equation*} \mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}=g_{\alpha \beta}^{\prime} \mathrm{d} x^{\alpha^{\prime}} \mathrm{d} x^{\beta^{\prime}} \tag{5.6} \end{equation*}(5.6)ds2=gμνdxμdxν=gαβdxαdxβ
Using the standard transformation law for differentials 5 5 ^(5){ }^{5}5
(5.7) g α β d x α d x β = g μ ν x μ x α x ν x β d x α d x β (5.7) g α β d x α d x β = g μ ν x μ x α x ν x β d x α d x β {:(5.7)g_(alpha beta)^(')dx^(alpha^('))dx^(beta^('))=g_(mu nu)(delx^(mu))/(delx^(alpha^(')))(delx^(nu))/(delx^(beta^(')))*dx^(alpha^('))dx^(beta^(')):}\begin{equation*} g_{\alpha \beta}^{\prime} \mathrm{d} x^{\alpha^{\prime}} \mathrm{d} x^{\beta^{\prime}}=g_{\mu \nu} \frac{\partial x^{\mu}}{\partial x^{\alpha^{\prime}}} \frac{\partial x^{\nu}}{\partial x^{\beta^{\prime}}} \cdot \mathrm{d} x^{\alpha^{\prime}} \mathrm{d} x^{\beta^{\prime}} \tag{5.7} \end{equation*}(5.7)gαβdxαdxβ=gμνxμxαxνxβdxαdxβ
We conclude that we have the transformation law
(5.8) g α β = g μ ν x μ x α x ν x β . (5.8) g α β = g μ ν x μ x α x ν x β . {:(5.8)g_(alpha beta)^(')=g_(mu nu)(delx^(mu))/(delx^(alpha^(')))*(delx^(nu))/(delx^(beta^('))).:}\begin{equation*} g_{\alpha \beta}^{\prime}=g_{\mu \nu} \frac{\partial x^{\mu}}{\partial x^{\alpha^{\prime}}} \cdot \frac{\partial x^{\nu}}{\partial x^{\beta^{\prime}}} . \tag{5.8} \end{equation*}(5.8)gαβ=gμνxμxαxνxβ.
That is, the components of the metric transform as we expect the components of a ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) tensor to transform.
3 3 ^(3){ }^{3}3 We have also seen that it is possible to define the metric as a ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) tensor, which is a slot machine that takes two 1 -forms as input and outputs a scalar. In components, this gives the 'up' form of the metric g μ ν g μ ν g^(mu nu)g^{\mu \nu}gμν. As explained in the previous chapter, the 'up' form of the metric is the inverse of the 'down' form, or
g μ ν g ν λ = δ λ μ g μ ν g ν λ = δ λ μ g^(mu nu)g_(nu lambda)=delta_(lambda)^(mu)g^{\mu \nu} g_{\nu \lambda}=\delta_{\lambda}^{\mu}gμνgνλ=δλμ
4 4 ^(4){ }^{4}4 This is a useful example of the metric tensor acting as a slot machine. If we want to find the squared infinitesimal interval between positions we insert d x d x dx\mathrm{d} \boldsymbol{x}dx into both slots of the metric tensor and the output is exactly the invariant interval d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 that we seek
5 5 ^(5){ }^{5}5 Note that the line element in this form features a multiplication of the factors d x μ d x μ dx^(mu)\mathrm{d} x^{\mu}dxμ. In this expression, it does not represent an infinitesimal area, but rather the square of a length, and so the transformation simply requires a multiplication of the individual transformations. Later in the chapter we shall see how a transformation of a product that does represent an area or volume requires us to consider a Jacobian. This feature is discussed in more detail in Part V of the book (see, in particular, Chapter 38).

5.2 Meet some metrics

With some general rules recapped, let's now meet some metric line elements. 6 6 ^(6){ }^{6}6
  • We begin with the simplest and most familiar line element. This is the metric for Euclidean space in three dimensions, whose line element is simply
(5.9) d s 2 = d x 2 + d y 2 + d z 2 (5.9) d s 2 = d x 2 + d y 2 + d z 2 {:(5.9)ds^(2)=dx^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{5.9} \end{equation*}(5.9)ds2=dx2+dy2+dz2
  • Special relativity is founded on the Minkowski metric in (3+1) dimensions (i.e. a combination of three spatial dimensions and one time dimension). In Cartesian coordinates, the line element for this metric is written as
(5.10) d s 2 = d t 2 + d x 2 + d y 2 + d z 2 (5.10) d s 2 = d t 2 + d x 2 + d y 2 + d z 2 {:(5.10)ds^(2)=-dt^(2)+dx^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{5.10} \end{equation*}(5.10)ds2=dt2+dx2+dy2+dz2
We can also take the coordinate components to form a vector with components d x μ = ( d t , d x , d y , d z ) d x μ = ( d t , d x , d y , d z ) dx^(mu)=(dt,dx,dy,dz)\mathrm{d} x^{\mu}=(\mathrm{d} t, \mathrm{~d} x, \mathrm{~d} y, \mathrm{~d} z)dxμ=(dt, dx, dy, dz) and write the line element as d s 2 = g μ ν d x μ d x ν d s 2 = g μ ν d x μ d x ν ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ds2=gμνdxμdxν where
(5.11) g μ ν = ( 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ) (5.11) g μ ν = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 {:(5.11)g_(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]):}g_{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \tag{5.11}\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)(5.11)gμν=(1000010000100001)
  • There's nothing stopping us working in cylindrical polar coordinates and describing the same flat Minkowski space using these. 7 7 ^(7){ }^{7}7 The Minkowski line element in cylindrical polars is therefore
(5.15) d s 2 = d t 2 + d r 2 + r 2 d θ 2 + d z 2 (5.15) d s 2 = d t 2 + d r 2 + r 2 d θ 2 + d z 2 {:(5.15)ds^(2)=-dt^(2)+dr^(2)+r^(2)dtheta^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}+\mathrm{d} z^{2} \tag{5.15} \end{equation*}(5.15)ds2=dt2+dr2+r2 dθ2+dz2
We can also take the coordinate components to form a column vector with components d x μ = ( d t , d r , d θ , d z ) d x μ = ( d t , d r , d θ , d z ) dx^(mu)=(dt,dr,dtheta,dz)\mathrm{d} x^{\mu}=(\mathrm{d} t, \mathrm{~d} r, \mathrm{~d} \theta, \mathrm{~d} z)dxμ=(dt, dr, dθ, dz) and then the line element is d s 2 = g μ ν d x μ d x ν d s 2 = g μ ν d x μ d x ν ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ds2=gμνdxμdxν, where
(5.16) g μ ν = ( 1 0 0 0 0 1 0 0 0 0 r 2 0 0 0 0 1 ) (5.16) g μ ν = 1 0 0 0 0 1 0 0 0 0 r 2 0 0 0 0 1 {:(5.16)g_(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,r^(2),0],[0,0,0,1]):}g_{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \tag{5.16}\\ 0 & 1 & 0 & 0 \\ 0 & 0 & r^{2} & 0 \\ 0 & 0 & 0 & 1 \end{array}\right)(5.16)gμν=(1000010000r200001)
  • In the same way, we can write the matrix representing the metric components for flat space in spherical polar coordinates. 8 8 ^(8){ }^{8}8
(5.19) g μ ν = ( 1 0 0 0 0 1 0 0 0 0 r 2 0 0 0 0 r 2 sin 2 θ ) (5.19) g μ ν = 1 0 0 0 0 1 0 0 0 0 r 2 0 0 0 0 r 2 sin 2 θ {:(5.19)g_(mu nu)=([-1,0,0,0],[0,1,0,0],[0,0,r^(2),0],[0,0,0,r^(2)sin^(2)theta]):}g_{\mu \nu}=\left(\begin{array}{cccc} -1 & 0 & 0 & 0 \tag{5.19}\\ 0 & 1 & 0 & 0 \\ 0 & 0 & r^{2} & 0 \\ 0 & 0 & 0 & r^{2} \sin ^{2} \theta \end{array}\right)(5.19)gμν=(1000010000r20000r2sin2θ)
or
(5.20) d s 2 = d t 2 + d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 (5.20) d s 2 = d t 2 + d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 {:(5.20)ds^(2)=-dt^(2)+dr^(2)+r^(2)dtheta^(2)+r^(2)sin^(2)thetadphi^(2):}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}+r^{2} \sin ^{2} \theta \mathrm{~d} \phi^{2} \tag{5.20} \end{equation*}(5.20)ds2=dt2+dr2+r2 dθ2+r2sin2θ dϕ2
Example 5.2
Although the space that's described here is still flat, if we fix r r rrr we have our first curved space: the two-dimensional surface of a sphere. Consider the surface of a sphere of circumference 2 π a 2 π a 2pi a2 \pi a2πa. From the spherical polar example, we need only fix r = a r = a r=ar=ar=a and the metric line element for this two-dimensional surface becomes
(5.21) d s 2 = d t 2 + a 2 ( d θ 2 + sin 2 θ d ϕ 2 ) (5.21) d s 2 = d t 2 + a 2 d θ 2 + sin 2 θ d ϕ 2 {:(5.21)ds^(2)=-dt^(2)+a^(2)((d)theta^(2)+sin^(2)theta(d)phi^(2)):}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+a^{2}\left(\mathrm{~d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right) \tag{5.21} \end{equation*}(5.21)ds2=dt2+a2( dθ2+sin2θ dϕ2)
  • A final example of a metric is the Newtonian limit of the metric that emerges from general relativity as a limiting case of the Einstein equation. This Newtonian-metric line element is written as
(5.22) d s 2 = ( 1 + 2 Φ ) d t 2 + ( 1 2 Φ ) ( d x 2 + d y 2 + d z 2 ) (5.22) d s 2 = ( 1 + 2 Φ ) d t 2 + ( 1 2 Φ ) d x 2 + d y 2 + d z 2 {:(5.22)ds^(2)=-(1+2Phi)dt^(2)+(1-2Phi)(dx^(2)+dy^(2)+dz^(2)):}\begin{equation*} \mathrm{d} s^{2}=-(1+2 \Phi) \mathrm{d} t^{2}+(1-2 \Phi)\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}\right) \tag{5.22} \end{equation*}(5.22)ds2=(1+2Φ)dt2+(12Φ)(dx2+dy2+dz2)
where Φ Φ Phi\PhiΦ is the gravitational potential. We have already motivated the ( 1 + 2 Φ ) ( 1 + 2 Φ ) (1+2Phi)(1+2 \Phi)(1+2Φ) term multiplying d t 2 d t 2 dt^(2)\mathrm{d} t^{2}dt2 in eqn 2.75 (Example 2.13) where by working in the limit v c v c v≪cv \ll cvc we were only able to obtain the correction to the time-dependent part of the metric. Equation 5.22 has included a very similar correction to the spatial coordinates as well. Note that if the potential Φ Φ Phi\PhiΦ is set to zero, this metric reverts to the Minkowski metric of flat space (eqn 5.10), as expected.
Equation 5.22 can also be expressed by writing down the non-zero components of the metric tensor g 00 = ( 1 + 2 Φ ) , g 11 = g 22 = g 00 = ( 1 + 2 Φ ) , g 11 = g 22 = g_(00)=-(1+2Phi),g_(11)=g_(22)=g_{00}=-(1+2 \Phi), g_{11}=g_{22}=g00=(1+2Φ),g11=g22= g 33 = 1 2 Φ g 33 = 1 2 Φ g_(33)=1-2Phig_{33}=1-2 \Phig33=12Φ, or by writing down the components of the tensor as
(5.23) g μ ν = ( ( 1 + 2 Φ ) 0 0 0 0 1 2 Φ 0 0 0 0 1 2 Φ 0 0 0 0 1 2 Φ ) (5.23) g μ ν = ( 1 + 2 Φ ) 0 0 0 0 1 2 Φ 0 0 0 0 1 2 Φ 0 0 0 0 1 2 Φ {:(5.23)g_(mu nu)=([-(1+2Phi),0,0,0],[0,1-2Phi,0,0],[0,0,1-2Phi,0],[0,0,0,1-2Phi]):}g_{\mu \nu}=\left(\begin{array}{cccc} -(1+2 \Phi) & 0 & 0 & 0 \tag{5.23}\\ 0 & 1-2 \Phi & 0 & 0 \\ 0 & 0 & 1-2 \Phi & 0 \\ 0 & 0 & 0 & 1-2 \Phi \end{array}\right)(5.23)gμν=((1+2Φ)000012Φ000012Φ000012Φ)
Note that the potential Φ Φ Phi\PhiΦ is assumed to be a function of position, but this limit works for static solutions in which the potential is not time-varying.
  • The Newtonian-limit line element can also be written in spherical polars
(5.24) d s 2 = ( 1 + 2 Φ ) d t 2 + ( 1 2 Φ ) ( d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 ) (5.24) d s 2 = ( 1 + 2 Φ ) d t 2 + ( 1 2 Φ ) d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 {:(5.24)ds^(2)=-(1+2Phi)dt^(2)+(1-2Phi)(dr^(2)+r^(2)(d)theta^(2)+r^(2)sin^(2)theta(d)phi^(2)):}\begin{equation*} \mathrm{d} s^{2}=-(1+2 \Phi) \mathrm{d} t^{2}+(1-2 \Phi)\left(\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}+r^{2} \sin ^{2} \theta \mathrm{~d} \phi^{2}\right) \tag{5.24} \end{equation*}(5.24)ds2=(1+2Φ)dt2+(12Φ)(dr2+r2 dθ2+r2sin2θ dϕ2)
Notice how in the previous examples of metrics (other than the simple Minkowski metric line element in Cartesian coordinates) the components of the metric vary in space. That is, the metric is a function of the underlying coordinates. We should therefore write the metric components as a function g μ ν ( x ) g μ ν ( x ) g_(mu nu)(x)g_{\mu \nu}(x)gμν(x). The metric is now manifestly an example of a field. The metric field takes a spacetime coordinate x μ x μ x^(mu)x^{\mu}xμ as an input and outputs the metric tensor at that point. 9 9 ^(9){ }^{9}9

\curvearrowright See Chapter 14 for a derivation of eqn 5 . 2 2 5 . 2 2 5.22\mathbf{5 . 2 2}5.22.

9 9 ^(9){ }^{9}9 Here's a simple example of the use of the metric using the slot-machine picture. Working in flat space with cylindrical coordinates X μ = ( t , r , θ , z ) X μ = ( t , r , θ , z ) X^(mu)=(t,r,theta,z)X^{\mu}=(t, r, \theta, z)Xμ=(t,r,θ,z), the distance between points separated by an interval d X d X dX\mathrm{d} \boldsymbol{X}dX with coordinates d X μ = ( 0 , 0 , d θ , 0 ) d X μ = ( 0 , 0 , d θ , 0 ) dX^(mu)=(0,0,dtheta,0)\mathrm{d} X^{\mu}=(0,0, \mathrm{~d} \theta, 0)dXμ=(0,0, dθ,0) is found by evaluating
d s 2 = g ( d X , d X ) = g μ ν d X μ d X ν d s 2 = g ( d X , d X ) = g μ ν d X μ d X ν {:[ds^(2)=g(dX","dX)],[=g_(mu nu)dX^(mu)dX^(nu)]:}\begin{aligned} \mathrm{d} s^{2} & =\boldsymbol{g}(\mathrm{d} \boldsymbol{X}, \mathrm{~d} \boldsymbol{X}) \\ & =g_{\mu \nu} \mathrm{d} X^{\mu} \mathrm{d} X^{\nu} \end{aligned}ds2=g(dX, dX)=gμνdXμdXν
= g θ θ d θ 2 = r 2 d θ 2 = g θ θ d θ 2 = r 2 d θ 2 =g_(theta theta)dtheta^(2)=r^(2)dtheta^(2)=g_{\theta \theta} \mathrm{d} \theta^{2}=r^{2} \mathrm{~d} \theta^{2}=gθθdθ2=r2 dθ2
Notice how the interval d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 changes its size depending on the value of r r rrr at which we evaluate the interval. This follows from g g g\boldsymbol{g}g being a field: we input a position in spacetime and output the interval appropriate for that position. To compute an interval between points θ 1 θ 1 theta_(1)\theta_{1}θ1 and θ 2 θ 2 theta_(2)\theta_{2}θ2, separated by a larger interval (i.e. the separation is not infinitesimal), we then evaluate
Δ s = d s = θ 1 θ 2 g θ θ d θ = θ 1 θ 2 r d θ Δ s = d s = θ 1 θ 2 g θ θ d θ = θ 1 θ 2 r d θ Delta s=intds=int_(theta_(1))^(theta_(2))sqrt(g_(theta theta))dtheta=int_(theta_(1))^(theta_(2))rdtheta\Delta s=\int \mathrm{d} s=\int_{\theta_{1}}^{\theta_{2}} \sqrt{g_{\theta \theta}} \mathrm{d} \theta=\int_{\theta_{1}}^{\theta_{2}} r \mathrm{~d} \thetaΔs=ds=θ1θ2gθθdθ=θ1θ2r dθ.
We'll use these ideas in the following sections.
10 10 ^(10){ }^{10}10 In fact, there are two different ways of visualizing a metric: you can draw its light cones, or embed a slice of it in a higher dimensional space. The latter approach is discussed in Appendix D.

(b)

(c)
Fig. 5.1 Light cones in (a) flat
spacetime, (b) Rindler spacetime and spacetime, (b) Rindler spacetime and
(c) baby-Eddington-Finkelstein coordinates.

5.3 Light and light cones

One way of visualizing a metric is to draw the light cones in the spacetime that it describes. Light cones are absolute surfaces that separate timelike and spacelike intervals. We use local light cones to visualize spaces because, in curved spacetime, the light cones change their orientation as a function of position in spacetime. The pattern of local light cones represents almost all of the structure of spacetime. 10 10 ^(10){ }^{10}10 Recall that the infinitesimal interval between two events on the photon's world line satisfies
(5.27) d s 2 = 0 (5.27) d s 2 = 0 {:(5.27)ds^(2)=0:}\begin{equation*} \mathrm{d} s^{2}=0 \tag{5.27} \end{equation*}(5.27)ds2=0
Using this condition, we can work out the orientation of the light cones in whichever coordinates we've chosen to use. The orientation is important as it reveals the signals that observers can send and receive at each point. We investigate some examples below.
Example 5.3
(a) Light in flat Minkowski spacetime obeys
(5.28) d s 2 = d t 2 + d x 2 + d y 2 + d z 2 = 0 (5.28) d s 2 = d t 2 + d x 2 + d y 2 + d z 2 = 0 {:(5.28)ds^(2)=-dt^(2)+dx^(2)+dy^(2)+dz^(2)=0:}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}=0 \tag{5.28} \end{equation*}(5.28)ds2=dt2+dx2+dy2+dz2=0
from which we find
(5.29) d t d | r | = ± 1 (5.29) d t d | r | = ± 1 {:(5.29)(dt)/((d)|( vec(r))|)=+-1:}\begin{equation*} \frac{\mathrm{d} t}{\mathrm{~d}|\vec{r}|}= \pm 1 \tag{5.29} \end{equation*}(5.29)dt d|r|=±1
and so d t = ± d | r | d t = ± d | r | dt=+-d| vec(r)|\mathrm{d} t= \pm \mathrm{d}|\vec{r}|dt=±d|r|, which is integrated to give an equation for the light cones emerging from a point ( t 0 , r 0 ) t 0 , r 0 (t_(0),r_(0))\left(t_{0}, r_{0}\right)(t0,r0), with the result that light cones obey the coordinate equation
(5.30) t t 0 = ± ( | r | | r 0 | ) (5.30) t t 0 = ± | r | r 0 {:(5.30)t-t_(0)=+-(|( vec(r))|-| vec(r)_(0)|):}\begin{equation*} t-t_{0}= \pm\left(|\vec{r}|-\left|\vec{r}_{0}\right|\right) \tag{5.30} \end{equation*}(5.30)tt0=±(|r||r0|)
The light cones look the same everywhere [see Fig. 5.1(a), where we just draw the forward light cones for simplicity]. This uniformity of the light cones throughout spacetime is exactly as we stated earlier [see Fig. 3(a)]. A useful innovation at this point are the so-called light-cone coordinates
(5.31) u = t | r | , v = t + | r | (5.31) u = t | r | , v = t + | r | {:(5.31)u=t-| vec(r)|","quad v=t+| vec(r)|:}\begin{equation*} u=t-|\vec{r}|, \quad v=t+|\vec{r}| \tag{5.31} \end{equation*}(5.31)u=t|r|,v=t+|r|
giving
(5.32) d s 2 = d u d v (5.32) d s 2 = d u d v {:(5.32)ds^(2)=-dudv:}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} u \mathrm{~d} v \tag{5.32} \end{equation*}(5.32)ds2=du dv
from which we can immediately read off that the light cones are coincident with the lines of constant u u uuu and v v vvv [see Fig. 5.1(a)].
Let's try other spacetimes. (For this particular exercise, we simply pluck these from thin air, without derivation.)
(b) The first one is known as Rindler spacetime (it comes from a coordinate choice appropriate for accelerated observers) and has a metric line element
(5.33) d s 2 = x 2 d t 2 + d x 2 (5.33) d s 2 = x 2 d t 2 + d x 2 {:(5.33)ds^(2)=-x^(2)dt^(2)+dx^(2):}\begin{equation*} \mathrm{d} s^{2}=-x^{2} \mathrm{~d} t^{2}+\mathrm{d} x^{2} \tag{5.33} \end{equation*}(5.33)ds2=x2 dt2+dx2
Setting d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0 for light, we have the equation of the light cone
(5.34) d t d x = ± 1 x (5.34) d t d x = ± 1 x {:(5.34)(dt)/((d)x)=+-(1)/(x):}\begin{equation*} \frac{\mathrm{d} t}{\mathrm{~d} x}= \pm \frac{1}{x} \tag{5.34} \end{equation*}(5.34)dt dx=±1x
Integrating, we find
(5.35) t t 0 = ± ln | x | (5.35) t t 0 = ± ln | x | {:(5.35)t-t_(0)=+-ln |x|:}\begin{equation*} t-t_{0}= \pm \ln |x| \tag{5.35} \end{equation*}(5.35)tt0=±ln|x|
We see that the cones change their shape as a function of position as shown in Fig. 5.1(b).
(c) Let's try another spacetime. This one has a metric line element we'll call the baby-Eddington-Finkelstein metric
(5.36) d s 2 = x d v 2 + 2 d v d x (5.36) d s 2 = x d v 2 + 2 d v d x {:(5.36)ds^(2)=-xdv^(2)+2dvdx:}\begin{equation*} \mathrm{d} s^{2}=-x \mathrm{~d} v^{2}+2 \mathrm{~d} v \mathrm{~d} x \tag{5.36} \end{equation*}(5.36)ds2=x dv2+2 dv dx
Light cones must cause d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2 to vanish and so we spot that they have v = v = v=v=v= constant and also d v / d x = 2 / x d v / d x = 2 / x dv//dx=2//x\mathrm{d} v / \mathrm{d} x=2 / xdv/dx=2/x, so we find
(5.37) v v 0 = 2 ln | x | (5.37) v v 0 = 2 ln | x | {:(5.37)v-v_(0)=2ln |x|:}\begin{equation*} v-v_{0}=2 \ln |x| \tag{5.37} \end{equation*}(5.37)vv0=2ln|x|
The light cones are shown in Fig. 5.1(c).
(d) Finally, we consider the slightly more complicated Eddington-Finkelstein metric 11 11 ^(11)^{11}11
(5.38) d s 2 = ( 1 2 G M r ) d v 2 + 2 d v d r (5.38) d s 2 = 1 2 G M r d v 2 + 2 d v d r {:(5.38)ds^(2)=-(1-(2GM)/(r))dv^(2)+2dvdr:}\begin{equation*} \mathrm{d} s^{2}=-\left(1-\frac{2 G M}{r}\right) \mathrm{d} v^{2}+2 \mathrm{~d} v \mathrm{~d} r \tag{5.38} \end{equation*}(5.38)ds2=(12GMr)dv2+2 dv dr
Light cones have v = v = v=v=v= const again and also
(5.39) d v d r = 2 ( 1 2 G M r ) 1 (5.39) d v d r = 2 1 2 G M r 1 {:(5.39)(dv)/((d)r)=2(1-(2GM)/(r))^(-1):}\begin{equation*} \frac{\mathrm{d} v}{\mathrm{~d} r}=2\left(1-\frac{2 G M}{r}\right)^{-1} \tag{5.39} \end{equation*}(5.39)dv dr=2(12GMr)1
Being able to visualize spacetimes in this way allows us to understand some exotic spacetimes, such as that of the next example.
Example 5.4
The Alcubierre metric 12 12 ^(12){ }^{12}12 is an attempt to build the spacetime that would result from the action of a warp drive. A warp drive, much discussed in science fiction, is a device that appears to allow faster-than-light travel (judged from the point of view of an observer in flat spacetime). Of course, the propagation of signals (and travellers) faster than c c ccc is not allowed. Instead, the warp drive works by making a bubble of curved spacetime where the light cones are oriented differently to those in flat space. The mathematical construction of the bubble starts with a curve x = x s ( t ) , y = x = x s ( t ) , y = x=x_(s)(t),y=x=x_{\mathrm{s}}(t), y=x=xs(t),y= z = 0 z = 0 z=0z=0z=0, which has a tangent v s = d x / d t v s = d x / d t v_(s)=dx//dtv_{\mathrm{s}}=\mathrm{d} x / \mathrm{d} tvs=dx/dt. Now we construct a smooth bubble function f ( r s ) f r s f(r_(s))f\left(r_{\mathrm{s}}\right)f(rs), where r s = ( x x s ) 2 + y 2 + z 2 r s = x x s 2 + y 2 + z 2 r_(s)=sqrt((x-x_(s))^(2)+y^(2)+z^(2))r_{\mathrm{s}}=\sqrt{\left(x-x_{\mathrm{s}}\right)^{2}+y^{2}+z^{2}}rs=(xxs)2+y2+z2. By construction, this function has the property f ( 0 ) = 1 f ( 0 ) = 1 f(0)=1f(0)=1f(0)=1 and decreases as we move from the origin, vanishing for r s > R r s > R r_(s) > Rr_{\mathrm{s}}>Rrs>R, where R R RRR is some distance that sets the edge of the bubble. The Alcubierre metric is then written as
(5.40) d s 2 = [ 1 + v s ( t ) 2 f ( r s ) 2 ] d t 2 + 2 v s ( t ) f ( r s ) d x d t + d x 2 + d y 2 + d z 2 (5.40) d s 2 = 1 + v s ( t ) 2 f r s 2 d t 2 + 2 v s ( t ) f r s d x d t + d x 2 + d y 2 + d z 2 {:(5.40)ds^(2)=-[1+v_(s)(t)^(2)f(r_(s))^(2)]dt^(2)+2v_(s)(t)f(r_(s))dxdt+dx^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=-\left[1+v_{\mathrm{s}}(t)^{2} f\left(r_{\mathrm{s}}\right)^{2}\right] \mathrm{d} t^{2}+2 v_{\mathrm{s}}(t) f\left(r_{\mathrm{s}}\right) \mathrm{d} x \mathrm{~d} t+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{5.40} \end{equation*}(5.40)ds2=[1+vs(t)2f(rs)2]dt2+2vs(t)f(rs)dx dt+dx2+dy2+dz2
where v s = d x s d t v s = d x s d t v_(s)=(dx_(s))/(dt)v_{\mathrm{s}}=\frac{\mathrm{d} x_{\mathrm{s}}}{\mathrm{d} t}vs=dxsdt. This can be simplified to read
(5.41) d s 2 = d t 2 + [ d x v s ( t ) f ( r s ) d t ] 2 + d y 2 + d z 2 (5.41) d s 2 = d t 2 + d x v s ( t ) f r s d t 2 + d y 2 + d z 2 {:(5.41)ds^(2)=-dt^(2)+[dx-v_(s)(t)f(r_(s))dt]^(2)+dy^(2)+dz^(2):}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\left[\mathrm{d} x-v_{\mathrm{s}}(t) f\left(r_{\mathrm{s}}\right) \mathrm{d} t\right]^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2} \tag{5.41} \end{equation*}(5.41)ds2=dt2+[dxvs(t)f(rs)dt]2+dy2+dz2
This all looks rather complicated, but is best understood using Fig. 5.2. Figure 5.2 (a) shows the world line of the traveller between two distant points in spacetime with different values of the coordinate x x xxx. The warp drive creates a bubble in spacetime shown by the dotted region. We can examine the light-cone structure inside the bubble by setting d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0. We find that the light cones are given by
(5.42) d x d t = ± 1 + v s ( t ) f ( r s ) (5.42) d x d t = ± 1 + v s ( t ) f r s {:(5.42)(dx)/((d)t)=+-1+v_(s)(t)f(r_(s)):}\begin{equation*} \frac{\mathrm{d} x}{\mathrm{~d} t}= \pm 1+v_{\mathrm{s}}(t) f\left(r_{\mathrm{s}}\right) \tag{5.42} \end{equation*}(5.42)dx dt=±1+vs(t)f(rs)
In regions outside the bubble, where f = 0 f = 0 f=0f=0f=0, the light cones are the normal ones of flat spacetime. Inside the bubble, the function f f fff causes the light cones to tip over, as shown in Fig. 5.2(b). The traveller must always move inside her forward light cone, but we see that inside the bubble, the tipping of the light cones means that the tangent of the world line appears to give rise to a velocity greater than c c ccc, judged by the light cones outside of the warp bubble. As a result of the warp in spacetime, the traveller is able to travel vast distances that would require superluminal velocities in flat spacetime. 13 13 ^(13){ }^{13}13
11 11 ^(11){ }^{11}11 We shall see this again when we examine the geometry of stars. It arises in the theory of spherically symmetric black holes.
12 12 ^(12){ }^{12}12 Miguel Alcubierre (1964-). The warp drive proposal originated in M. Alcubierre, Class. Quantum Grav. 11, L73 (1994).
(a)

(b)
Fig. 5.2 (a) The world line of a trip in spacetime from ( t 1 , x 1 t 1 , x 1 t_(1), vec(x)_(1)t_{1}, \vec{x}_{1}t1,x1 ) to ( t 2 , x 2 t 2 , x 2 t_(2), vec(x)_(2)t_{2}, \vec{x}_{2}t2,x2 ), surrounded by a warped bubble of spacetime. (b) The light-cone structure in the warped region of spacetime. The light cones tip over in the warped region.
13 13 ^(13){ }^{13}13 It's worth noting that warping spacetime in this way would require a source of negative energy, so is not something that is readily achievable!

5.4 Lengths, areas, volumes

Once we have a metric in the form of a line element, we can use it to calculate not only lengths (or intervals), but also areas and volumes.

Example 5.5

Consider the metric for flat space, given in terms of spherical polars, with line element
14 14 ^(14){ }^{14}14 Recall that we can write this d s 2 = d s 2 = d t 2 + d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 d s 2 = d s 2 = d t 2 + d r 2 + r 2 d θ 2 + r 2 sin 2 θ d ϕ 2 ds^(2)=quadds^(2)=-dt^(2)+dr^(2)+r^(2)dtheta^(2)+r^(2)sin^(2)thetadphi^(2)\mathrm{d} s^{2}=\quad \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}+r^{2} \sin ^{2} \theta \mathrm{~d} \phi^{2}ds2=ds2=dt2+dr2+r2 dθ2+r2sin2θ dϕ2. 14 14 ^(14){ }^{14}14 We can use this to find the radius of a g t t d t 2 + g r r d r 2 + g θ θ d θ 2 + g ϕ ϕ d ϕ 2 g t t d t 2 + g r r d r 2 + g θ θ d θ 2 + g ϕ ϕ d ϕ 2 g_(tt)dt^(2)+g_(rr)dr^(2)+g_(theta theta)dtheta^(2)+g_(phi phi)dphi^(2)g_{t t} \mathrm{~d} t^{2}+g_{r r} \mathrm{~d} r^{2}+g_{\theta \theta} \mathrm{d} \theta^{2}+g_{\phi \phi} \mathrm{d} \phi^{2}gtt dt2+grr dr2+gθθdθ2+gϕϕdϕ2. In this example, we use the fact that if d t = d θ = d ϕ = 0 d t = d θ = d ϕ = 0 dt=dtheta=dphi=0\mathrm{d} t=\mathrm{d} \theta=\mathrm{d} \phi=0dt=dθ=dϕ=0, then d s = g r r d r d s = g r r d r ds=sqrt(g_(rr))dr\mathrm{d} s=\sqrt{g_{r r}} \mathrm{~d} rds=grr dr, and so on.
Fig. 5.3 A sphere of radius R R RRR. A circle on the surface of the sphere has a radius r 0 r 0 r_(0)r_{0}r0 and is a line of fixed θ = θ 0 θ = θ 0 theta=theta_(0)\theta=\theta_{0}θ=θ0. sphere. We fix t = θ = ϕ = t = θ = ϕ = t=theta=phi=t=\theta=\phi=t=θ=ϕ= constant so that d t = d θ = d ϕ = 0 d t = d θ = d ϕ = 0 dt=dtheta=dphi=0\mathrm{d} t=\mathrm{d} \theta=\mathrm{d} \phi=0dt=dθ=dϕ=0, with the
d s 2 = d r 2 d s 2 = d r 2 ds^(2)=dr^(2)\mathrm{d} s^{2}=\mathrm{d} r^{2}ds2=dr2. We then integrate the line element d s d s ds\mathrm{d} sds from r = 0 r = 0 r=0r=0r=0 to r = R r = R r=Rr=Rr=R
(5.43) 0 R d s = 0 R d r = R (5.43) 0 R d s = 0 R d r = R {:(5.43)int_(0)^(R)ds=int_(0)^(R)dr=R:}\begin{equation*} \int_{0}^{R} \mathrm{~d} s=\int_{0}^{R} \mathrm{~d} r=R \tag{5.43} \end{equation*}(5.43)0R ds=0R dr=R
This is no surprise, but shows the general principle of how to manipulate the metric to extract a length. In the same way, we can find the radius and circumference of a circle (Fig. 5.3) which has a fixed value of θ = θ 0 θ = θ 0 theta=theta_(0)\theta=\theta_{0}θ=θ0 on the sphere. First, let's work out r 0 r 0 r_(0)r_{0}r0 which is the distance from the North pole to the circle. We start by fixing t t ttt, r r rrr and ϕ ϕ phi\phiϕ (so that d s 2 = R 2 d θ 2 d s 2 = R 2 d θ 2 ds^(2)=R^(2)dtheta^(2)\mathrm{d} s^{2}=R^{2} \mathrm{~d} \theta^{2}ds2=R2 dθ2 ) and then integrating d s d s ds\mathrm{d} sds to find
(5.44) θ = 0 θ 0 d s = θ = 0 θ 0 g θ θ ( r = R ) d θ = θ = 0 θ 0 R d θ = θ 0 R (5.44) θ = 0 θ 0 d s = θ = 0 θ 0 g θ θ ( r = R ) d θ = θ = 0 θ 0 R d θ = θ 0 R {:(5.44)int_(theta=0)^(theta_(0))ds=int_(theta=0)^(theta_(0))sqrt(g_(theta theta)(r=R))dtheta=int_(theta=0)^(theta_(0))Rdtheta=theta_(0)R:}\begin{equation*} \int_{\theta=0}^{\theta_{0}} \mathrm{~d} s=\int_{\theta=0}^{\theta_{0}} \sqrt{g_{\theta \theta}(r=R)} \mathrm{d} \theta=\int_{\theta=0}^{\theta_{0}} R \mathrm{~d} \theta=\theta_{0} R \tag{5.44} \end{equation*}(5.44)θ=0θ0 ds=θ=0θ0gθθ(r=R)dθ=θ=0θ0R dθ=θ0R
Again, an unsurprising result (elementary geometry tells us that θ 0 R = r 0 θ 0 R = r 0 theta_(0)R=r_(0)\theta_{0} R=r_{0}θ0R=r0 ). The circumference of this circle is calculated using d s 2 = g ϕ ϕ ( r = R , θ = θ 0 ) d ϕ 2 = d s 2 = g ϕ ϕ r = R , θ = θ 0 d ϕ 2 = ds^(2)=g_(phi phi)(r=R,theta=theta_(0))dphi^(2)=\mathrm{d} s^{2}=g_{\phi \phi}\left(r=R, \theta=\theta_{0}\right) \mathrm{d} \phi^{2}=ds2=gϕϕ(r=R,θ=θ0)dϕ2= R 2 sin 2 θ 0 d ϕ 2 R 2 sin 2 θ 0 d ϕ 2 R^(2)sin^(2)theta_(0)dphi^(2)R^{2} \sin ^{2} \theta_{0} \mathrm{~d} \phi^{2}R2sin2θ0 dϕ2 and so
(5.45) ϕ = 0 2 π d s = ϕ = 0 2 π g ϕ ϕ ( R , θ 0 ) d ϕ = 2 π R sin θ 0 = 2 π r 0 sinc r 0 R (5.45) ϕ = 0 2 π d s = ϕ = 0 2 π g ϕ ϕ R , θ 0 d ϕ = 2 π R sin θ 0 = 2 π r 0 sinc r 0 R {:(5.45)int_(phi=0)^(2pi)ds=int_(phi=0)^(2pi)sqrt(g_(phi phi)(R,theta_(0)))dphi=2pi R sin theta_(0)=2pir_(0)sinc(r_(0))/(R):}\begin{equation*} \int_{\phi=0}^{2 \pi} \mathrm{~d} s=\int_{\phi=0}^{2 \pi} \sqrt{g_{\phi \phi}\left(R, \theta_{0}\right)} \mathrm{d} \phi=2 \pi R \sin \theta_{0}=2 \pi r_{0} \operatorname{sinc} \frac{r_{0}}{R} \tag{5.45} \end{equation*}(5.45)ϕ=02π ds=ϕ=02πgϕϕ(R,θ0)dϕ=2πRsinθ0=2πr0sincr0R
agreeing with eqn 3.20 .
We notice from these examples that the (proper) length of infinitesimal segments of coordinate x 1 x 1 x^(1)x^{1}x1 are given by d l 1 = g 11 d x 1 d l 1 = g 11 d x 1 dl^(1)=sqrt(g_(11))dx^(1)\mathrm{d} l^{1}=\sqrt{g_{11}} \mathrm{~d} x^{1}dl1=g11 dx1. We can use this fact to work out how to calculate areas and volumes.

Example 5.6

Consider the special case of a diagonal 15 15 ^(15){ }^{15}15 metric, with line element
(5.46) d s 2 = g 00 ( d x 0 ) 2 + g 11 ( d x 1 ) 2 + g 22 ( d x 2 ) 2 + g 33 ( d x 3 ) 2 (5.46) d s 2 = g 00 d x 0 2 + g 11 d x 1 2 + g 22 d x 2 2 + g 33 d x 3 2 {:(5.46)ds^(2)=g_(00)((d)x^(0))^(2)+g_(11)((d)x^(1))^(2)+g_(22)((d)x^(2))^(2)+g_(33)((d)x^(3))^(2):}\begin{equation*} \mathrm{d} s^{2}=g_{00}\left(\mathrm{~d} x^{0}\right)^{2}+g_{11}\left(\mathrm{~d} x^{1}\right)^{2}+g_{22}\left(\mathrm{~d} x^{2}\right)^{2}+g_{33}\left(\mathrm{~d} x^{3}\right)^{2} \tag{5.46} \end{equation*}(5.46)ds2=g00( dx0)2+g11( dx1)2+g22( dx2)2+g33( dx3)2
matrix of metric components g μ ν g μ ν g_(mu nu)g_{\mu \nu}gμν. A diagonal metric has a line element that features the squares of intervals such as ( d x μ ) 2 d x μ 2 (dx^(mu))^(2)\left(\mathrm{d} x^{\mu}\right)^{2}(dxμ)2, but not mixed components such as d x μ d x ν d x μ d x ν dx^(mu)dx^(nu)\mathrm{d} x^{\mu} \mathrm{d} x^{\nu}dxμdxν with μ ν μ ν mu!=nu\mu \neq \nuμν.
16 16 ^(16){ }^{16}16 This is one example of an area. We can also form areas from other pairs of coordinates too. For our example of a coordinates too. For our example of a
sphere, this choice turns out to be a sphere, this choice turns out to be a
sensible one since it yields an element of surface area.
tten as 16  tten as  16 " tten as "^(16)\text { tten as }{ }^{16} tten as 16
d A = d l 2 d l 3 d A = d l 2 d l 3 dA=dl^(2)dl^(3)\mathrm{d} A=\mathrm{d} l^{2} \mathrm{~d} l^{3}dA=dl2 dl3
(5.47) = g 22 g 33 d x 2 d x 3 (5.47) = g 22 g 33 d x 2 d x 3 {:(5.47)=sqrt(g_(22)g_(33))dx^(2)dx^(3):}\begin{equation*} =\sqrt{g_{22} g_{33}} \mathrm{~d} x^{2} \mathrm{~d} x^{3} \tag{5.47} \end{equation*}(5.47)=g22g33 dx2 dx3
Similarly, an element of 3 -volume is written as
(5.48) d V = d l 1 d l 2 d l 3 (5.48) d V = d l 1 d l 2 d l 3 {:(5.48)dV=dl^(1)dl^(2)dl^(3):}\begin{equation*} \mathrm{d} V=\mathrm{d} l^{1} \mathrm{~d} l^{2} \mathrm{~d} l^{3} \tag{5.48} \end{equation*}(5.48)dV=dl1 dl2 dl3
= g 11 g 22 g 33 d x 1 d x 2 d x 3 = g 11 g 22 g 33 d x 1 d x 2 d x 3 =sqrt(g_(11)g_(22)g_(33))dx^(1)dx^(2)dx^(3)=\sqrt{g_{11} g_{22} g_{33}} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3}=g11g22g33 dx1 dx2 dx3.
As an example, let's consider the metric in spherical polar coordinates. The area element is, using eqn 5.47 , given by
(5.49) d A = r 2 sin θ d θ d ϕ (5.49) d A = r 2 sin θ d θ d ϕ {:(5.49)dA=r^(2)sin thetadthetadphi:}\begin{equation*} \mathrm{d} A=r^{2} \sin \theta \mathrm{~d} \theta \mathrm{~d} \phi \tag{5.49} \end{equation*}(5.49)dA=r2sinθ dθ dϕ
and the volume, using eqn 5.48 , is
(5.50) d V = r 2 sin θ d r d θ d ϕ (5.50) d V = r 2 sin θ d r d θ d ϕ {:(5.50)dV=r^(2)sin thetadrdthetadphi:}\begin{equation*} \mathrm{d} V=r^{2} \sin \theta \mathrm{~d} r \mathrm{~d} \theta \mathrm{~d} \phi \tag{5.50} \end{equation*}(5.50)dV=r2sinθ dr dθ dϕ
For an element of 4 -volume [i.e. a volume of (3+1)-dimensional spacetime], we need to take account of the fact that the timelike component of all Lorentz metrics comes with a minus sign. We therefore write
d V = d l 0 d l 1 d l 2 d l 3 (5.51) = g 00 g 11 g 22 g 33 d x 0 d x 1 d x 2 d x 3 d V = d l 0 d l 1 d l 2 d l 3 (5.51) = g 00 g 11 g 22 g 33 d x 0 d x 1 d x 2 d x 3 {:[dV=dl^(0)dl^(1)dl^(2)dl^(3)],[(5.51)=sqrt(-g_(00)g_(11)g_(22)g_(33))dx^(0)dx^(1)dx^(2)dx^(3)]:}\begin{align*} \mathrm{d} \mathcal{V} & =\mathrm{d} l^{0} \mathrm{~d} l^{1} \mathrm{~d} l^{2} \mathrm{~d} l^{3} \\ & =\sqrt{-g_{00} g_{11} g_{22} g_{33}} \mathrm{~d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3} \tag{5.51} \end{align*}dV=dl0 dl1 dl2 dl3(5.51)=g00g11g22g33 dx0 dx1 dx2 dx3
and since g 00 g 00 g_(00)g_{00}g00 is negative, we take the square root of a positive quantity. Note that the product g 00 g 11 g 22 g 33 g 00 g 11 g 22 g 33 g_(00)g_(11)g_(22)g_(33)g_{00} g_{11} g_{22} g_{33}g00g11g22g33 is the determinant of the diagonal metric matrix. 17 17 ^(17){ }^{17}17
We now show that this result generalizes to cases where we don't have a diagonal metric, which is to say that an element of ( 3 + 1 ) ( 3 + 1 ) (3+1)(3+1)(3+1)-dimensional volume, known as a 4 -volume, is given by
(5.53) d V = det g d 4 x (5.53) d V = det g d 4 x {:(5.53)dV=sqrt(-det g)d^(4)x:}\begin{equation*} \mathrm{d} \mathcal{V}=\sqrt{-\operatorname{det} \boldsymbol{g}} \mathrm{d}^{4} x \tag{5.53} \end{equation*}(5.53)dV=detgd4x
where we note that the determinant det g det g det g\operatorname{det} \boldsymbol{g}detg is often simply written g g ggg, so we would write d V = g d 4 x d V = g d 4 x dV=sqrt(-g)d^(4)x\mathrm{d} \mathcal{V}=\sqrt{-g} \mathrm{~d}^{4} xdV=g d4x, with d 4 x = d x 0 d x 1 d x 2 d x 3 d 4 x = d x 0 d x 1 d x 2 d x 3 d^(4)x=dx^(0)dx^(1)dx^(2)dx^(3)\mathrm{d}^{4} x=\mathrm{d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3}d4x=dx0 dx1 dx2 dx3. As we shall see below, the 4 -volume d V d V dV\mathrm{d} \mathcal{V}dV is an invariant.
To prove that the 4 -volume is an invariant, let's note that volumes transform using an object that is called the Jacobian. 18 18 ^(18){ }^{18}18 For our case of ( 3 + 1 ) ( 3 + 1 ) (3+1)(3+1)(3+1)-dimensional spacetime, the Jacobian is given by the determinant of the transformation matrix and so
(5.55) J = ( x 0 , x 1 , x 2 , x 3 ) ( x 0 , x 1 , x 2 , x 3 ) = det ( Λ β α ) (5.55) J = x 0 , x 1 , x 2 , x 3 x 0 , x 1 , x 2 , x 3 = det Λ β α {:(5.55)J=(del(x^(0^(')),x^(1^(')),x^(2^(')),x^(3^('))))/(del(x^(0),x^(1),x^(2),x^(3)))=det(Lambda_(beta)^(alpha^('))):}\begin{equation*} J=\frac{\partial\left(x^{0^{\prime}}, x^{1^{\prime}}, x^{2^{\prime}}, x^{3^{\prime}}\right)}{\partial\left(x^{0}, x^{1}, x^{2}, x^{3}\right)}=\operatorname{det}\left(\Lambda_{\beta}^{\alpha^{\prime}}\right) \tag{5.55} \end{equation*}(5.55)J=(x0,x1,x2,x3)(x0,x1,x2,x3)=det(Λβα)
Volume elements in different coordinate systems are then transformed using the Jacobian by writing
(5.56) d x 0 d x 1 d x 2 d x 3 = ( x 0 , x 1 , x 2 , x 3 ) ( x 0 , x 1 , x 2 , x 3 ) d x 0 d x 1 d x 2 d x 3 (5.56) d x 0 d x 1 d x 2 d x 3 = x 0 , x 1 , x 2 , x 3 x 0 , x 1 , x 2 , x 3 d x 0 d x 1 d x 2 d x 3 {:(5.56)dx^(0^('))dx^(1^('))dx^(2^('))dx^(3^('))=(del(x^(0^(')),x^(1^(')),x^(2^(')),x^(3^('))))/(del(x^(0),x^(1),x^(2),x^(3)))dx^(0)dx^(1)dx^(2)dx^(3):}\begin{equation*} \mathrm{d} x^{0^{\prime}} \mathrm{d} x^{1^{\prime}} \mathrm{d} x^{2^{\prime}} \mathrm{d} x^{3^{\prime}}=\frac{\partial\left(x^{0^{\prime}}, x^{1^{\prime}}, x^{2^{\prime}}, x^{3^{\prime}}\right)}{\partial\left(x^{0}, x^{1}, x^{2}, x^{3}\right)} \mathrm{d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3} \tag{5.56} \end{equation*}(5.56)dx0dx1dx2dx3=(x0,x1,x2,x3)(x0,x1,x2,x3)dx0 dx1 dx2 dx3
We can now prove the general rule for finding the volume of an element of 4-space. We assume that the volume on the left is the usual volume of an infinitesimal element in a Cartesian flat space. We need to show that the Jacobian (i.e. the determinant of the transformation matrix) is equal to g g sqrt(-g)\sqrt{-g}g. The argument proceeds from a principle called local flatness. 19 19 ^(19){ }^{19}19 An observer will perceive spacetime to be flat at the point at which they reside, in much the same way that (unless you are living on a hillside) we tend to perceive the Earth as locally flat and only abandon our locally Euclidean street-maps when we look further afield. The following example fills in the details.

Example 5.7

Proof: Arguing from local flatness, we transform the Minkowski metric tensor η η eta\boldsymbol{\eta}η (for an observer's locally flat spacetime) into a general tensor g g g\boldsymbol{g}g (for curved spacetime) at the particular point in spacetime of the observer's location. This can be done using 20 20 ^(20){ }^{20}20
17 17 ^(17){ }^{17}17 The determinant of an n × n n × n n xx nn \times nn×n matrix A A A\boldsymbol{A}A is computed using the rule
det A = (5.52) i 1 , i 2 , , i n = 1 n ε i 1 i n A 1 , i 1 A n , i n , det A _ = (5.52) i 1 , i 2 , , i n = 1 n ε i 1 i n A 1 , i 1 A n , i n , {:[detA_=],[(5.52)sum_(i_(1),i_(2),dots,i_(n)=1)^(n)epsi_(i_(1)dotsi_(n))A_(1,i_(1))dotsA_(n,i_(n))","]:}\begin{align*} & \operatorname{det} \underline{\boldsymbol{A}}= \\ & \sum_{i_{1}, i_{2}, \ldots, i_{n}=1}^{n} \varepsilon_{i_{1} \ldots i_{n}} A_{1, i_{1}} \ldots A_{n, i_{n}}, \tag{5.52} \end{align*}detA=(5.52)i1,i2,,in=1nεi1inA1,i1An,in,
where ε i 1 i n ε i 1 i n epsi_(i_(1)dotsi_(n))\varepsilon_{i_{1} \ldots i_{n}}εi1in is the Levi-Civita symbol, which is defined as ε i 1 i n = 1 ε i 1 i n = 1 epsi_(i_(1)dotsi_(n))=1\varepsilon_{i_{1} \ldots i_{n}}=1εi1in=1 for an even permutation of the indices and = 1 = 1 =-1=-1=1 for an odd permutation. If the matrix is diagonal, the determinant is matrix is diagonal, the determinant is
simply the product of the n n nnn non-zero simply the
elements.
18 18 ^(18){ }^{18}18 The Jacobian is named after the German mathematician C. G. J. Jacobi (1804-1851). A general coordinate transformation may be written x μ = x μ ( x 1 , x 2 , x 3 , , x n ) x μ = x μ x 1 , x 2 , x 3 , , x n x^(mu^('))=x^(mu^('))(x^(1),x^(2),x^(3),dots,x^(n))x^{\mu^{\prime}}=x^{\mu^{\prime}}\left(x^{1}, x^{2}, x^{3}, \ldots, x^{n}\right)xμ=xμ(x1,x2,x3,,xn) where ( μ = 1 , 2 , n ) ( μ = 1 , 2 , n ) (mu=1,2dots,n)(\mu=1,2 \ldots, n)(μ=1,2,n). If we now arrange the n × n n × n n xx nn \times nn×n partial derivatives x μ / x ν x μ / x ν delx^(mu^('))//delx^(nu)\partial x^{\mu^{\prime}} / \partial x^{\nu}xμ/xν into the transformation matrix
(5.54) Λ ν μ = x μ x ν = ( x 1 x 1 x 1 x n x n x 1 x n x n ) (5.54) Λ ν μ = x μ x ν = x 1 x 1 x 1 x n x n x 1 x n x n {:[(5.54)Lambda_(nu)^(mu^('))=(delx^(mu^(')))/(delx^(nu))],[=([(delx^(1^(')))/(delx^(1)),dots,(delx^(1^(')))/(delx^(n))],[vdots,dots,vdots],[(delx^(n^(')))/(delx^(1)),dots,(delx^(n^(')))/(delx^(n))])]:}\begin{align*} \Lambda_{\nu}^{\mu^{\prime}} & =\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}} \tag{5.54}\\ & =\left(\begin{array}{ccc} \frac{\partial x^{1^{\prime}}}{\partial x^{1}} & \ldots & \frac{\partial x^{1^{\prime}}}{\partial x^{n}} \\ \vdots & \ldots & \vdots \\ \frac{\partial x^{n^{\prime}}}{\partial x^{1}} & \ldots & \frac{\partial x^{n^{\prime}}}{\partial x^{n}} \end{array}\right) \end{align*}(5.54)Λνμ=xμxν=(x1x1x1xnxnx1xnxn)
then the Jacobian J J JJJ of the transformation matrix is then defined as the determinant of this matrix. In symbols, we write
J = ( x 1 x n ) ( x 1 x n ) = det ( x μ x ν ) J = x 1 x n x 1 x n = det x μ x ν J=(del(x^(1^('))dotsx^(n^('))))/(del(x^(1)dotsx^(n)))=det((delx^(mu^(')))/(delx^(nu)))J=\frac{\partial\left(x^{1^{\prime}} \ldots x^{n^{\prime}}\right)}{\partial\left(x^{1} \ldots x^{n}\right)}=\operatorname{det}\left(\frac{\partial x^{\mu^{\prime}}}{\partial x^{\nu}}\right)J=(x1xn)(x1xn)=det(xμxν)
where we've introduced the commonly used notation for J J JJJ. The Jacobian tells us how volume elements transform (and is discussed in more detail in Chapter 38).
19 19 ^(19){ }^{19}19 The concept of local flatness is explained in more detail in Chapter 6 and will be used regularly from here onwards.
20 20 ^(20){ }^{20}20 We write a matrix equation here where η η _ eta _\underline{\eta}η is the Minkowski matrix (i.e. the matrix with components η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν ) and Λ Λ _ Lambda _\underline{\boldsymbol{\Lambda}}Λ is a matrix with components Λ ν μ Λ ν μ Lambda_(nu)^(mu)\Lambda_{\nu}^{\mu}Λνμ. The equation therefore is simply a rewriting of eqn 5.8:
(5.57) g α β = Λ α μ Λ β ν η μ ν (5.57) g α β = Λ α μ Λ β ν η μ ν {:(5.57)g_(alpha beta)=Lambda_(alpha)^(mu)Lambda_(beta)^(nu)eta_(mu nu):}\begin{equation*} g_{\alpha \beta}=\Lambda_{\alpha}^{\mu} \Lambda_{\beta}^{\nu} \eta_{\mu \nu} \tag{5.57} \end{equation*}(5.57)gαβ=ΛαμΛβνημν
Taking determinants we find
(5.59) det g = det Λ det η det Λ T (5.59) det g _ = det Λ _ det η _ det Λ _ T {:(5.59)detg_=detLambda _deteta _detLambda _^(T):}\begin{equation*} \operatorname{det} \underline{\boldsymbol{g}}=\operatorname{det} \underline{\boldsymbol{\Lambda}} \operatorname{det} \underline{\boldsymbol{\eta}} \operatorname{det} \underline{\boldsymbol{\Lambda}}^{\mathrm{T}} \tag{5.59} \end{equation*}(5.59)detg=detΛdetηdetΛT
However, det A = det A T det A _ = det A _ T detA_=detA_^(T)\operatorname{det} \underline{\boldsymbol{A}}=\operatorname{det} \underline{\boldsymbol{A}}^{\mathrm{T}}detA=detAT and also det η = 1 det η _ = 1 deteta _=-1\operatorname{det} \underline{\boldsymbol{\eta}}=-1detη=1. We conclude therefore that
det g = ( det Λ ) 2 det g _ = ( det Λ _ ) 2 detg_=-(detLambda _)^(2)\operatorname{det} \underline{\boldsymbol{g}}=-(\operatorname{det} \underline{\boldsymbol{\Lambda}})^{2}detg=(detΛ)2
or, writing det g = g det g _ = g detg_=g\operatorname{det} \underline{g}=gdetg=g,
(5.61) det Λ = ( g ) 1 2 (5.61) det Λ _ = ( g ) 1 2 {:(5.61)detLambda _=(-g)^((1)/(2)):}\begin{equation*} \operatorname{det} \underline{\boldsymbol{\Lambda}}=(-g)^{\frac{1}{2}} \tag{5.61} \end{equation*}(5.61)detΛ=(g)12
The result is that the elementary volume in flat Cartesian spacetime (the primed coordinates) is given by
d x 0 d x 1 d x 2 d x 3 = ( x 0 , x 1 , x 2 , x 3 ) ( x 0 , x 1 , x 2 , x 3 ) d x 0 d x 1 d x 2 d x 3 (5.62) = ( g ) 1 2 d x 0 d x 1 d x 2 d x 3 d x 0 d x 1 d x 2 d x 3 = x 0 , x 1 , x 2 , x 3 x 0 , x 1 , x 2 , x 3 d x 0 d x 1 d x 2 d x 3 (5.62) = ( g ) 1 2 d x 0 d x 1 d x 2 d x 3 {:[dx^(0^('))dx^(1^('))dx^(2^('))dx^(3^('))=(del(x^(0^(')),x^(1^(')),x^(2^(')),x^(3^('))))/(del(x^(0),x^(1),x^(2),x^(3)))dx^(0)dx^(1)dx^(2)dx^(3)],[(5.62)=(-g)^((1)/(2))dx^(0)dx^(1)dx^(2)dx^(3)]:}\begin{align*} \mathrm{d} x^{0^{\prime}} \mathrm{d} x^{1^{\prime}} \mathrm{d} x^{2^{\prime}} \mathrm{d} x^{3^{\prime}} & =\frac{\partial\left(x^{0^{\prime}}, x^{1^{\prime}}, x^{2^{\prime}}, x^{3^{\prime}}\right)}{\partial\left(x^{0}, x^{1}, x^{2}, x^{3}\right)} \mathrm{d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3} \\ & =(-g)^{\frac{1}{2}} \mathrm{~d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3} \tag{5.62} \end{align*}dx0dx1dx2dx3=(x0,x1,x2,x3)(x0,x1,x2,x3)dx0 dx1 dx2 dx3(5.62)=(g)12 dx0 dx1 dx2 dx3
The right-hand side of this equation is just the invariant 4 -volume d V d V dV\mathrm{d} \mathcal{V}dV that we met in eqn 5.53 . Thus, the volume of an infinitesimal element in locally flat Cartesian space is equal to the invariant 4 -volume, nicely illustrating the principle of local flatness.

Chapter summary

  • The metric is a field that encodes the geometry of spacetime. It allows us to compute intervals between events. The ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) metric tensor takes two vectors and outputs a number
(5.63) g ( X , Y ) = g μ ν X μ Y ν (5.63) g ( X , Y ) = g μ ν X μ Y ν {:(5.63)g(X","Y)=g_(mu nu)X^(mu)Y^(nu):}\begin{equation*} \boldsymbol{g}(\boldsymbol{X}, \boldsymbol{Y})=g_{\mu \nu} X^{\mu} Y^{\nu} \tag{5.63} \end{equation*}(5.63)g(X,Y)=gμνXμYν
  • The components of a metric are often given via a line element
(5.64) d s 2 = g μ ν d x μ d x ν (5.64) d s 2 = g μ ν d x μ d x ν {:(5.64)ds^(2)=g_(mu nu)dx^(mu)dx^(nu):}\begin{equation*} \mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu} \tag{5.64} \end{equation*}(5.64)ds2=gμνdxμdxν
  • A metric can be visualized by working out its light cone structure. Light cones are defined by ds 2 = 0 ds 2 = 0 ds^(2)=0\mathrm{ds}^{2}=0ds2=0.
  • The invariant volume element is given by
(5.65) d V = g d 4 x (5.65) d V = g d 4 x {:(5.65)dV=sqrt(-g)d^(4)x:}\begin{equation*} \mathrm{d} \mathcal{V}=\sqrt{-g} \mathrm{~d}^{4} x \tag{5.65} \end{equation*}(5.65)dV=g d4x
where g g ggg is the determinant of the metric tensor g g g\boldsymbol{g}g.
  • The principle of local flatness says that an observer will perceive spacetime to be flat Minkowski spacetime at the point at which they reside.

Exercises

(5.1) By finding the determinant g = det g g = det g g=det gg=\operatorname{det} \boldsymbol{g}g=detg of the rele vant metric, compute the invariant volume element for the following systems:
(a) A flat two-dimensional surface in cylindrical coordinates.
(b) A two-dimensional spherical surface in spherical coordinates.
(c) The two-dimensional surface of a torus.
Hint: A torus has a line element
(5.66) d s 2 = ( c + a cos v ) 2 d u 2 + a 2 d v 2 (5.66) d s 2 = ( c + a cos v ) 2 d u 2 + a 2 d v 2 {:(5.66)ds^(2)=(c+a cos v)^(2)du^(2)+a^(2)dv^(2):}\begin{equation*} \mathrm{d} s^{2}=(c+a \cos v)^{2} \mathrm{~d} u^{2}+a^{2} \mathrm{~d} v^{2} \tag{5.66} \end{equation*}(5.66)ds2=(c+acosv)2 du2+a2 dv2
(5.2) Show that the Rindler metric
(5.67) d s 2 = x 2 d t 2 + d x 2 (5.67) d s 2 = x 2 d t 2 + d x 2 {:(5.67)ds^(2)=-x^(2)dt^(2)+dx^(2):}\begin{equation*} \mathrm{d} s^{2}=-x^{2} \mathrm{~d} t^{2}+\mathrm{d} x^{2} \tag{5.67} \end{equation*}(5.67)ds2=x2 dt2+dx2
represents flat space in coordinates ( T , X ) ( T , X ) (T,X)(T, X)(T,X) by investigating the coordinate transformation
(5.68) X = x cosh t , T = x sinh t (5.68) X = x cosh t , T = x sinh t {:(5.68)X=x cosh t","quad T=x sinh t:}\begin{equation*} X=x \cosh t, \quad T=x \sinh t \tag{5.68} \end{equation*}(5.68)X=xcosht,T=xsinht
(5.3) Consider transforming into a reference frame moving at a constant speed v v vvv along the x x xxx-axis.
(a) Using the transformation x = x v t , y = y x = x v t , y = y x=x^(')-vt,y=y^(')x=x^{\prime}-v t, y=y^{\prime}x=xvt,y=y, z = z , t = t z = z , t = t z=z^('),t=t^(')z=z^{\prime}, t=t^{\prime}z=z,t=t, show that the Minkowski metric line element becomes, in a moving-coordinate reference frame,
d s 2 = d t 2 ( 1 v 2 ) + d x 2 + d y 2 + d z 2 2 v d x d t d s 2 = d t 2 1 v 2 + d x 2 + d y 2 + d z 2 2 v d x d t ds^(2)=-dt^('2)(1-v^(2))+dx^('2)+dy^('2)+dz^('2)-2vdx^(')dt^(')\mathrm{d} s^{2}=-\mathrm{d} t^{\prime 2}\left(1-v^{2}\right)+\mathrm{d} x^{\prime 2}+\mathrm{d} y^{\prime 2}+\mathrm{d} z^{\prime 2}-2 v \mathrm{~d} x^{\prime} \mathrm{d} t^{\prime}ds2=dt2(1v2)+dx2+dy2+dz22v dxdt. (5.69)
(b) What is the light cone structure of this metric? (c) What is the proper time interval measured along an observer's world line?
(d) Writing the line element as a matrix with components g μ ν g μ ν g_(mu nu)g_{\mu \nu}gμν, compute the inverse matrix with components g μ ν g μ ν g^(mu nu)g^{\mu \nu}gμν.
(5.4) Consider a metric line element
d s 2 = (5.70) d t 2 + a ( t ) 2 [ d r 2 1 k r 2 + r 2 ( d θ 2 + sin 2 θ d ϕ 2 ) ] d s 2 = (5.70) d t 2 + a ( t ) 2 d r 2 1 k r 2 + r 2 d θ 2 + sin 2 θ d ϕ 2 {:[ds^(2)=],[(5.70)-dt^(2)+a(t)^(2)[((d)r^(2))/(1-kr^(2))+r^(2)((d)theta^(2)+sin^(2)theta(d)phi^(2))]]:}\begin{align*} & \mathrm{d} s^{2}= \\ & -\mathrm{d} t^{2}+a(t)^{2}\left[\frac{\mathrm{~d} r^{2}}{1-k r^{2}}+r^{2}\left(\mathrm{~d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right)\right] \tag{5.70} \end{align*}ds2=(5.70)dt2+a(t)2[ dr21kr2+r2( dθ2+sin2θ dϕ2)]
where k k kkk is a constant.
(a) What is the proper length between events that occur at ( r , θ , ϕ ) ( r , θ , ϕ ) (r,theta,phi)(r, \theta, \phi)(r,θ,ϕ) and ( r + d r , θ , ϕ ) ( r + d r , θ , ϕ ) (r+dr,theta,phi)(r+\mathrm{d} r, \theta, \phi)(r+dr,θ,ϕ) ?
(b) A light pulse is sent from ( t em , χ , 0 , 0 ) t em , χ , 0 , 0 (t_(em),chi,0,0)\left(t_{\mathrm{em}}, \chi, 0,0\right)(tem,χ,0,0) to ( t ob , 0 , 0 , 0 ) t ob , 0 , 0 , 0 (t_(ob),0,0,0)\left(t_{\mathrm{ob}}, 0,0,0\right)(tob,0,0,0). Show that the photon's path can be described by
(5.71) t em t ob d t a ( t ) = χ 0 d r ( 1 k r 2 ) 1 2 (5.71) t em t ob d t a ( t ) = χ 0 d r 1 k r 2 1 2 {:(5.71)int_(t_(em))^(t_(ob))((d)t)/(a(t))=int_(chi)^(0)((d)r)/((1-kr^(2))^((1)/(2))):}\begin{equation*} \int_{t_{\mathrm{em}}}^{t_{\mathrm{ob}}} \frac{\mathrm{~d} t}{a(t)}=\int_{\chi}^{0} \frac{\mathrm{~d} r}{\left(1-k r^{2}\right)^{\frac{1}{2}}} \tag{5.71} \end{equation*}(5.71)temtob dta(t)=χ0 dr(1kr2)12
(5.5) Consider again the metric line element for Rindler space
(5.72) d s 2 = x 2 d t 2 + d x 2 (5.72) d s 2 = x 2 d t 2 + d x 2 {:(5.72)ds^(2)=-x^(2)dt^(2)+dx^(2):}\begin{equation*} \mathrm{d} s^{2}=-x^{2} \mathrm{~d} t^{2}+\mathrm{d} x^{2} \tag{5.72} \end{equation*}(5.72)ds2=x2 dt2+dx2
(a) What is the proper length interval between events at x = x a x = x a x=x_(a)x=x_{a}x=xa and x a + d x x a + d x x_(a)+dxx_{a}+\mathrm{d} xxa+dx ?
(b) What is the proper time interval between events at t b t b t_(b)t_{b}tb and t b + d t b t b + d t b t_(b)+dt_(b)t_{b}+\mathrm{d} t_{b}tb+dtb ?

Part II

Curvature and general relativity

In this part of the book, we introduce the tools needed to understand the curvature of spacetime and its relationship to matter, culminating in the Einstein field equation.
  • In Chapter 6, we introduce some of the key ideas behind general relativity, and in particular the equivalence principle.
  • In Chapter 7, we describe the notion of what it means for vectors to be parallel in curved spaces. We introduce connection coefficients which allow us to take derivatives of vectors in curved space.
  • In Chapters 8 and 9 , we investigate geodesics: the paths that particles fall along in spacetime.
  • Although we study curved space, we make physical observations locally, in the flat space of our experience. The method to translate between different frames of reference is described in Chapter 10.
  • In Chapter 11, we explain how curvature of spaces are described using the Riemann tensor and the Ricci tensor. These will supply the left-hand side of the Einstein equation.
  • The right-hand side of Einstein's equation is supplied by the energy-momentum tensor, discussed in Chapter 12.
  • In Chapter 13, we write down the Einstein field equation, which is the foundation of general relativity.
  • In Chapter 14, we review some of the successes of general relativity that follow from the formalism described in this part of the book. Many of the topics described in this chapter will then be unpacked in more detail in the rest of the book.

6

Finding a theory of gravitation

6.1 Free fall and the equivalence principle
6.2 Why general relativity? 73
6.3 A differential equation to describe gravity 75 75 quad75\quad 7575
6.4 Local flatness 76
6.5 Time dilation in a gravitational field
Chapter summary 79
Exercises 79
A little reflection will show that the law of the equality of the inertial and gravitational mass is equivalent to the assertion that the acceleration imparted to a body by a gravitational field is independent of the nature of the body... It is only when there is numerical equality between the inertial and gravitational mass that the acceleration is independent of the nature of the body.
Albert Einstein
Newtonian gravitation acts instantaneously across the Universe and therefore picks out a unique time for all observers when the gravitational interaction occurs. This is inconsistent with relativity and its treatment of simultaneity. We must therefore look beyond Newton's theory for a complete description of gravitation. We begin our search with two claims. (1) A person falling under gravity can't feel their own weight. This is a statement of the principle of equivalence. (2) The metric alone describes the role of spacetime in the laws of physics. This is an expression of general covariance. Taken together, these two ideas, which turn out to be closely linked, will guide us towards a theory of gravitation.

6.1 Free fall and the equivalence principle

Central to general relativity is the notion that the inertial mass m i m i m_(i)m_{\mathrm{i}}mi of a particle and its gravitational mass m g m g m_(g)m_{\mathrm{g}}mg are identical. If we write an equation of motion for a particle in a gravitational field as
(6.1) m i x ¨ = m g g ( x ) , (6.1) m i x ¨ = m g g ( x ) , {:(6.1)m_(i) vec(x)^(¨)=m_(g) vec(g)( vec(x))",":}\begin{equation*} m_{\mathrm{i}} \ddot{\vec{x}}=m_{\mathrm{g}} \vec{g}(\vec{x}), \tag{6.1} \end{equation*}(6.1)mix¨=mgg(x),
then, taking m i = m g m i = m g m_(i)=m_(g)m_{\mathrm{i}}=m_{\mathrm{g}}mi=mg, the equation of motion tells us that the gravitational field g ( x ) g ( x ) vec(g)( vec(x))\vec{g}(\vec{x})g(x) acting on a particle is equal to the acceleration x x vec(x)\vec{x}x of the coordinates of the particle. Einstein's great insight was to grasp that gravity and an accelerating coordinate system are actually the same thing. A freely falling 1 1 ^(1){ }^{1}1 astronaut, sees, by definition, no change in her coordinates in her local rest frame and so concludes that there is no gravitational field. To put it another way, a freely falling observer cannot feel their own weight.
The weak principle of equivalence 2 2 ^(2){ }^{2}2 is a statement that gravitational and inertial mass are identical. This implies that it is possible to
choose a coordinate system in which the laws of motion of a freely falling particle take the same form as in unaccelerated Cartesian coordinates in the absence of gravitation. An observer O O OOO on Earth and her freely falling astronaut friend O O O^(')O^{\prime}O detect no difference in the laws of mechanics, except that O O OOO observes the effect of, and herself feels, a gravitational field, while freely falling O O O^(')O^{\prime}O does not. 3 3 ^(3){ }^{3}3

Example 6.1

Consider a cloud of N N NNN test particles, 4 4 ^(4){ }^{4}4 each with mass m m mmm, subject to a uniform gravitational field g g vec(g)\vec{g}g. Let's fix our attention on one particular particle, which will have an equation of motion of the form
(6.2) m d 2 x d t 2 = m g + p = 1 N 1 F ( x x p ) (6.2) m d 2 x d t 2 = m g + p = 1 N 1 F x x p {:(6.2)m(d^(2)( vec(x)))/((d)t^(2))=m vec(g)+sum_(p=1)^(N-1) vec(F)(( vec(x))- vec(x)_(p)):}\begin{equation*} m \frac{\mathrm{~d}^{2} \vec{x}}{\mathrm{~d} t^{2}}=m \vec{g}+\sum_{p=1}^{N-1} \vec{F}\left(\vec{x}-\vec{x}_{p}\right) \tag{6.2} \end{equation*}(6.2)m d2x dt2=mg+p=1N1F(xxp)
which is to say that it feels the force of gravity and the non-gravitational interaction forces from the N 1 N 1 N-1N-1N1 other particles. This equation of motion describes the dynamics within the coordinate system of a particular observer O O OOO. Next we make the coordinate transformation to another frame of reference that is uniformly accelerating with acceleration g g vec(g)\vec{g}g in the x x -x-xx-direction. The coordinate transformation is
(6.3) x = x 1 2 g t 2 , t = t (6.3) x = x 1 2 g t 2 , t = t {:(6.3) vec(x)^(')= vec(x)-(1)/(2) vec(g)t^(2)","quad t=t^('):}\begin{equation*} \vec{x}^{\prime}=\vec{x}-\frac{1}{2} \vec{g} t^{2}, \quad t=t^{\prime} \tag{6.3} \end{equation*}(6.3)x=x12gt2,t=t
These coordinates describe the viewpoint of the freely falling observer O O O^(')O^{\prime}O. The equation of motion becomes
(6.4) m d 2 x d t 2 = p = 1 N 1 F ( x x p ) (6.4) m d 2 x d t 2 = p = 1 N 1 F x x p {:(6.4)m(d^(2) vec(x)^('))/(dt^('2))=sum_(p=1)^(N-1) vec(F)( vec(x)^(')- vec(x)_(p)^(')):}\begin{equation*} m \frac{\mathrm{~d}^{2} \vec{x}^{\prime}}{\mathrm{d} t^{\prime 2}}=\sum_{p=1}^{N-1} \vec{F}\left(\vec{x}^{\prime}-\vec{x}_{p}^{\prime}\right) \tag{6.4} \end{equation*}(6.4)m d2xdt2=p=1N1F(xxp)
which looks like the equation of motion for the particle in the absence of the gravitational field g g vec(g)\vec{g}g.

Example 6.2

The principle of equivalence can also be illustrated by considering the acceleration measuring device in Fig. 6.1. It consists of a mass in a box suspended by springs. If the box is carried by an observer then it provides a measure of acceleration: an observer accelerating in the positive x x xxx-direction, will see the mass displaced in the negative x x xxx-direction. The mass will be also affected by the presence of a gravitational field, which also causes a displacement, but there is no way to distinguish this from the effect of an acceleration.
We can strengthen the equivalence principle to generalize it beyond the realm of mechanics. One form of the strong principle of equivalence says that it is possible to find a frame of reference where all non-gravitational laws of physics take on their special relativistic forms. In more detail:
The strong principle of equivalence: at every spacetime point in an arbitrary gravitational field it is possible to choose a local coordinate system such that, within a sufficiently small region of the point in question, all of the laws of nature take the same form as in unaccelerated Cartesian coordinate systems in the absence of gravitation.
3 3 ^(3){ }^{3}3 There are some caveats. One is that no difference is detected over a small region of space and time. The size of the region is described below.
4 4 ^(4){ }^{4}4 In this chapter, test particles have a low mass compared to the source of the gravitational field. As a result, we neglect any gravitational interaction between them
Fig. 6.1 A machine to measure acceleration consisting of a mass suspended in a box by light springs.
5 5 ^(5){ }^{5}5 Tidal forces arise due to the difference in gravitational field strength across a body. The gravitational field due to the Sun and Moon varies across the Earth and this results in a tidal force on our planet and its oceans. The tides (the rise and fall of sea level due to the motion of the Moon and the Sun) originate due to the greater responsiveness to tidal forces of fluid water (oceans) than solid rock (the Earth). Even the solid Earth responds a litEven the solid Earth responds a lit-
the bit to tidal forces (the so-called tle bit to tidal forces (the so-called
'Earth tide') and this is responsible 'Earth tide') and this is responsible
for the Large Electron-Positron Collider at CERN stretching a few millimetres from its circumference of about 27 km as the Earth stretches, something the CERN scientists have to correct for.
Fig. 6.2 (a) In a uniform field, two test particles released along parallel paths at different positions will both accelerate in the same direction. A coordinate ate in the same direction. A coordinate
transformation can be used to remove transformation can be used to remove
this acceleration. (b) A real gravitational field, such as that generated by a planet, is never uniform and so test particles initially released along parallel paths will approach each other. The apparent force causing acceleration between them is called a tidal force.
Example 6.3
Special relativity tells us that a particle at rest in an inertial frame moves along the time axis. The strong principle of equivalence tells us that the same must be true in general relativity, so that free-falling particles follow a curve whose tangent vector is always timelike (known as a timelike curve). Such timelike curves are called geodesics. The equivalence principle has ended up being surprisingly powerful because it has allowed us to take a result from special relativity (in which gravity is completely absent) and deduce from it an important result in general relativity (which includes gravity): free-falling particles follow timelike curves!
Let's emphasize an important idea: at the particular point where the observer is localized, there is no way to distinguish between a gravitational field and acceleration. Therefore, we can transform away the apparent effect of acceleration to give the physics expected from special relativity in the absence of gravity. As a result of the strong principle of equivalence, no experiment can distinguish between a homogeneous gravitational field and an accelerating reference frame. That is, if all points experience a uniform gravitational field (as they did in Example 6.1), then we can find a coordinate system where we transform away the equivalent effects of the gravitational field and acceleration for all points, so that it appears that gravity is not acting. In less technical language: a uniform gravitational field is equivalent to there being no gravitational field. This would seem to make hunting for the effects of gravitation a hopeless endeavour, since we could never be sure that an apparent gravitational effect wasn't simply the effect of an accelerating coordinate system. All is not lost, however, because real gravitational fields are never homogeneous! Over sufficiently large distance, a real gravitational field can be distinguished from an accelerating reference frame. Distinguishing them can be done by noticing the presence of tidal forces, 5 5 ^(5){ }^{5}5 as shown in the following example.

Example 6.4

An observer awakes in a spaceship feeling the apparent effect of gravity holding them in bed. Is this due to the acceleration of the ship or to the gravitational field of a nearby planet? A planet's field, assumed spherically symmetric, is not homogeneous, so can, in principle, be detected. The observer releases two test particles, initially moving parallel to each other (Fig. 6.2). If the particles start to move towards each other (or away from each other) we attribute this to a tidal force. The presence of a tidal force tells the observer that (s)he is not simply in an accelerating frame and therefore gravity must be acting.
Our investigation of the equivalence principle allows us to learn a number of lessons that will be very important in the rest of this book as we describe general relativity.
  • Lesson 1: A freely falling observer does not feel any force and so can't tell if a gravitational field is present. They fall along a timelike curve called a geodesic. 6 6 ^(6){ }^{6}6
  • Lesson 2: Locally, the freely falling observer can set up a laboratory covered by the unaccelerated coordinate system familiar from special relativity. This is an example of a local inertial frame (LIF). 7 7 ^(7){ }^{7}7 All laws of physics described in a LIF are those from special relativity.
  • Lesson 3: If measurements are made over a sufficiently small time frame and length scale, the observer can never detect the effect of gravitation. However, a real gravitational field will be inhomogeneous, so an observer can detect the effects of gravitation by making measurements at different points in spacetime that show the effects of tidal forces. 8 8 ^(8){ }^{8}8
Let's now get a quick physics payoff from the equivalence principle and show that light is bent by a gravitational field.

Example 6.5

Consider the experiment shown in Fig. 6.3(a) in which a rocket accelerates upwards with an acceleration g g ggg. A particle passes through the windows of the rocket and travels through the interior of the rocket, but by the time it has passed through the width of the rocket, the rocket has moved upwards and so it ends up landing on the rocket wall at a position which is lower than the point at which it entered. Figure 6.3(b) illustrates that from the point of view of an astronaut inside the rocket the photon follows a curved trajectory. 9 9 ^(9){ }^{9}9 The parabolic path would follow the equations x = x 0 + v t x = x 0 + v t x=x_(0)+vtx=x_{0}+v tx=x0+vt, where v v vvv is the velocity of the particle, and y = y 0 1 2 g t 2 y = y 0 1 2 g t 2 y=y_(0)-(1)/(2)gt^(2)y=y_{0}-\frac{1}{2} g t^{2}y=y012gt2, where x x xxx and y y yyy are the coordinates [horizontal and vertical respectively, in Fig. 6.3(b)] measured in the rocket frame and ( x 0 , y 0 ) x 0 , y 0 (x_(0),y_(0))\left(x_{0}, y_{0}\right)(x0,y0) are the coordinates of the point where the particle enters the rocket at t = 0 t = 0 t=0t=0t=0. Thus, y = y 0 g x 2 / ( 2 v 2 ) y = y 0 g x 2 / 2 v 2 y=y_(0)-gx^(2)//(2v^(2))y=y_{0}-g x^{2} /\left(2 v^{2}\right)y=y0gx2/(2v2).
However, so far, this analysis has shown nothing remarkable. But now we can deploy the equivalence principle, which implies that the astronaut cannot tell whether her rocket is accelerating upwards, with an acceleration of g g ggg, or simply that she is experiencing a gravitational field equal to g g ggg. So it is possible that her rocket is parked on a planet with a gravitational field of g g ggg and the bending of the particle trajectory would still occur. We can make the experiment particularly vivid by imagining that the particle is a photon, so that it would suggest that light is bent by gravitational fields! Our analysis shows that, for our rocket problem, the light beam changes from purely horizontal to travelling at an angle of | d y / d x | = g x / c 2 | d y / d x | = g x / c 2 ~~|dy//dx|=gx//c^(2)\approx|\mathrm{d} y / \mathrm{d} x|=g x / c^{2}|dy/dx|=gx/c2 to the horizontal.
This argument is only part of the story however. It is quantitatively correct for a particle beam travelling at a non-relativistic velocity, but it turns out that the for a particle beam travelling at a non-relativistic velocity, but it turns out that the
bending effect for photons in the field of a spherical mass is actually three times larger than suggested by this analysis. The problem for the photon case (or any particle travelling at relativistic speeds) is in the incorrect assumption that our experiment is carried out over a sufficiently small distance to allow our straightforward application of the equivalence principle. However, the intuition that light should bend is correct, even if the size of the bending is not, and our argument does account for one third of the total deflection effect
6 6 ^(6){ }^{6}6 We can contrast the difference in philosophy between the Newtonian and Einsteinian views of gravitation. In Newtonian gravitation, the Sun exerts a force on the Earth; in Einsteinian gravitation, the Earth feels no force and simply falls freely along a geodesic which is a path that orbits the Sun.
7 7 ^(7){ }^{7}7 If orthonormal basis vectors are used to describe a LIF, as in the case of the usual conventions of special relativity, the frame is sometimes called a local Lorentz frame.
8 8 ^(8){ }^{8}8 This notion of 'sufficiently small' provides the caveat for the earlier sidenote.
9 9 ^(9){ }^{9}9 This is nothing particularly to do with special relativity (the rocket is only starting to accelerate and so its speed is much less than c c ccc ) and the same effect would be seen with an accelerating car in vertical rain (with the diagram rotated).
10 10 ^(10){ }^{10}10 This was the effect that was measured very early in the history of relativity and gave an initial vindication of Einstein's ideas.
\curvearrowright Section 24.1 of Chapter 24 presents the calculation done properly.
Fig. 6.3 (a) Time snapshots in an inertial frame in which a rocket accelerates (upwards in this diagram). A particle passes through the windows of the rocket in a direction perpendicular to the direction of acceleration of the rocket. (b) In the rocket's frame, the path of the particle follows a parabolic trajectory.
We'll see later how gravitation follows from the curvature of spacetime encoded in the components of the metric tensor. This curvature can affect intervals in space and also in time; the argument presented above only takes the time part into account. However, the measurement of the effect for (highly relativistic) photons in a gravitational field from a spherically symmetric mass requires us to consider the space part too, since the deflection from this enters at the same order as that of the time part, owing to the photon exploring a spatial distance large enough to experience the spatial contribution to the curvature. The point is that this measurement is made over a distance where the curvature of space can be discerned, and is therefore not local enough to allow the straightforward application of the equivalence principle. If you treat the spatial part of the curvature as well, you recover the factor of three missing from the present analysis.
In spite of these difficulties, let's now use our result to have a first, hand-waving attempt at the famous problem of the bending of starlight around the Sun, visible during a total solar eclipse. 10 10 ^(10){ }^{10}10 We will treat the problem properly later, but we will here make a crude estimate where we discard numerical factors with wild abandon. Starlight passing close to the surface of the Sun will experience a gravitational field equal to g = G M / R 2 g = G M / R 2 g=GM_(o.)//R_(o.)^(2)g=G M_{\odot} / R_{\odot}^{2}g=GM/R2. This will be when the most bending will occur, but there will also be bending when the light is further away. Crudely, we can say that the starlight will be bent over a distance which must scale as R R R_(o.)R_{\odot}R and the gravitational field that it will experience will be of order g = G M / R 2 g = G M / R 2 g=GM_(o.)//R_(o.)^(2)g=G M_{\odot} / R_{\odot}^{2}g=GM/R2. Therefore, the angle of deflection θ θ theta\thetaθ (using our previous result that light is bent by an angle given very roughly by | d y / d x | = g x / c 2 | d y / d x | = g x / c 2 |dy//dx|=gx//c^(2)|\mathrm{d} y / \mathrm{d} x|=g x / c^{2}|dy/dx|=gx/c2 ) will be
(6.5) θ g R c 2 = G M R c 2 (6.5) θ g R c 2 = G M R c 2 {:(6.5)theta~~(gR_(o.))/(c^(2))=(GM_(o.))/(R_(o.)c^(2)):}\begin{equation*} \theta \approx \frac{g R_{\odot}}{c^{2}}=\frac{G M_{\odot}}{R_{\odot} c^{2}} \tag{6.5} \end{equation*}(6.5)θgRc2=GMRc2
Remarkably, this turns out to be the correct answer apart from a factor of 4 .
We can now attempt to find gravitational fields by transforming away all effects of uniform acceleration from our coordinate systems and conclude that whatever is left over at the end must be gravity. However, we still lack a clear guide to help us formulate a theory of relativistic gravitation. Before moving on, we present a brief historical interlude. Einstein's own motivation in invoking the principle of equivalence is often explained in terms of the influence of Mach's principle on his thinking, as discussed in the next example.
Example 6.6
Newton's laws appear to have restricted applicability: they apply in inertial frames In non-inertial frames, new inertial forces appear. 11 11 ^(11){ }^{11}11 These present a philosophical difficulty if we accept the principle of relativity, since they imply that mysterious new forces appear in non-inertial frames, but it is not immediately clear if these are exerted by space itself or by other bodies.
How do we define an inertial or non-inertial frame if there is no absolute space against which to judge whether the frame is accelerating or not? Ernst Mach 12 12 ^(12){ }^{12}12 came against which to judge whether the frame is accelerating or not? Ernst Mach 12 12 ^(12){ }^{12}12 came up with a solution that was a great influence on Einstein. Mach says we can judge a motion of all of the matter in the Universe. We say an inertial frame is unaccelerated with respect to fixed stars. 13 13 ^(13){ }^{13}13 This is helpful in that it allows us to use relativity to propose that inertial forces arise because of the acceleration of a mass relative to this fixed frame or, equivalently, the acceleration of the fixed frame with respect to the mass. This saves Newton's laws: they apply in all frames of reference with the extra non-inertial forces being real, physical forces that arise from the motion of the stars. We can then attempt to formulate a theory of inertial forces as arising from the interaction of inertial masses. By analogy with electromagnetism this would have a static contribution like Coulomb's law that varies according to α m i m j / r 2 α m i m j / r 2 alpham_(i)m_(j)//r^(2)\alpha m_{i} m_{j} / r^{2}αmimj/r2, with α α alpha\alphaα a constant. It might also have more complicated parts that contribute. For example, when masses are accelerating relative to each other 14 14 ^(14){ }^{14}14 we would expect a force of the form β m 1 m 2 / r β m 1 m 2 / r betam_(1)m_(2)//r\beta m_{1} m_{2} / rβm1m2/r, with β β beta\betaβ a constant.
Now, if we accept the principle of equivalence, then (the static part of) the inertial interaction can be directly identified with the gravitational force. The equivalence principle is then a principle of the equivalence of the gravitational force and the static part of the inertial force. We would then conclude that the relative acceleration of masses gives rise to a 1 / r 1 / r 1//r1 / r1/r contribution to the force. This does not exhaust the possible contributions to the inertial force, however, and so we should expect further contributions to the inertial part of the force law.
So why not continue along this line and describe gravitation by considering the interactions between particles in this way? The reason is that we are faced with a very complicated nonlinear problem. This is because mass and energy are interchangeable, and mass-energy is the source of gravitation. For example, when two particles act as a source of gravitation then to correctly evaluate the total size of the source of the gravitational effect, we must compute not only their individual gravitational potential energies but also the interaction potential energy between them. Each of these makes a contribution to the gravitational force. Owing to this complication of nonlinearity, we must abandon the discussion in terms of the interaction of individual masses. Fields then become important as they simplify our task. They allow us to avoid the question of how particles directly influence each other. Instead, a particle contributes to, and is acted on by, a field that is defined locally. In the field viewpoint, we therefore restrict our view to a local point in spacetime, and compute the effect of gravitation at the point of interest. 15 15 ^(15){ }^{15}15

6.2 Why general relativity?

Einstein's theory of gravitation is called general relativity because it is founded on the principle of general covariance. 16 16 ^(16){ }^{16}16
The principle of general covariance: the laws of physics must be invariant under all coordinate transformations, so the laws must hold in different coordinate systems.
This principle is, in a way, a restatement of the equivalence principle, but framing it this way is very helpful because it puts constraints on the
11 An 11 An ^(11)An{ }^{11} \mathrm{An}11An example is the centrifugal force experienced in a rotating frame. See Chapter 8 for a discussion of inertial forces.
12 12 ^(12){ }^{12}12 Ernst W. J. W. Mach (1838-1916)
13 13 ^(13){ }^{13}13 ' When the subway jerks, it's the fixed stars that throw you down.' Ernst Mach, attributed by Philipp Frank (1884-1966).
14 14 ^(14){ }^{14}14 An accelerating charge in electromagnetism exerts a force that varies as 1 / r 1 / r 1//r1 / r1/r. This is closely related to the interaction that gives rise to the emission of electromagnetic radiation from an accelerating charge.
15 15 ^(15){ }^{15}15 Even using fields, nonlinearity continues to make the problem a complicated one. This can be made clearer by considering the contrasting case of electromagnetism. In electromagnetism, charges are the sources of electromagnetic fields and ds and these fields add linearly. The charges determine the fields and the field determines the motion of the charges. An electromagnetic wave passes though another electromagnetic wave without scattering precisely because the theory is linear: the (electrically neutral) waves are not sources of electric field. This is not the case for gravitation. For example, gravitational waves interact with other gravitational waves since a gravitational wave is also a source of gravitation.
16 16 ^(16){ }^{16}16 As pointed out by several authors, 'general relativity' is something of a misnomer: it doesn't describe a form of relativity that is more general than special relativity.
17 17 ^(17){ }^{17}17 The law
F = d p d t F = d p d t F=(dp)/((d)t)\boldsymbol{F}=\frac{\mathrm{d} \boldsymbol{p}}{\mathrm{~d} t}F=dp dt
is a relationship between two geometrical quantities (in this case vectors) and no preferred basis is singled out. On the other hand, if the equation were something like
F = p d p y d t F = p d p y d t F=p(dp_(y))/((d)t)\boldsymbol{F}=\boldsymbol{p} \frac{\mathrm{d} p_{y}}{\mathrm{~d} t}F=pdpy dt
we would smell a rat, not only because it's not dimensionally correct but because it somehow makes the y y yyy-direction special and so it wouldn't transform sensibly.
8 8 ^(8){ }^{8}8 Indeed, we could attempt to regard this equation as a component of a more general tensor equation. As we will discover later, the left-hand side of the correct equation will turn out to be something to do with spacetime derivatives (but we will need a more sophisticated notion of curvature to get this right) and the right-hand source term is indeed something to do with the mass density, but in fact we need the energymomentum tensor introduced in Section 4.5 .
19 A 19 A ^(19)A{ }^{19} \mathrm{~A}19 A valid tensor equation can often be identified as having a component version whose indices match on both sides of the equation. Such an equation is sometimes called manifestly covariant.
20 20 ^(20){ }^{20}20 By diag ( 1 , 1 , 1 , 1 ) ( 1 , 1 , 1 , 1 ) (-1,1,1,1)(-1,1,1,1)(1,1,1,1) we mean a 4 × 4 × 4xx4 \times4× 4 matrix with these diagonal elements and all other elements equal to zero.
sorts of equations we can write down that will be physically valid. For a start, we can't have any preferred bases in our theory (any more than in Newtonian mechanics 17 17 ^(17){ }^{17}17 could claim that the y y yyy-axis was particularly special). This means that we can't write down a law which singles out some specific components of a vector or a tensor, for the simple reason that this would look very different in different frames of reference.

Example 6.7

Failed theory of gravitation: Based on what we have learned in the book so far, here is a misguided attempt at a theory of gravitation and then an explanation of why it won't work. We know Newton's approach to gravitation ended up with 2 Φ = 4 π G ρ 2 Φ = 4 π G ρ grad^(2)Phi=4pi G rho\nabla^{2} \Phi=4 \pi G \rho2Φ=4πGρ (eqn 15 of Chapter 0), so let's try and fix it up by using the 4 -vector generalization of 2 2 vec(grad)^(2)\vec{\nabla}^{2}2 which is 2 = μ μ = t 2 + 2 2 = μ μ = t 2 + 2 del^(2)=del_(mu)del^(mu)=-del_(t)^(2)+ vec(grad)^(2)\partial^{2}=\partial_{\mu} \partial^{\mu}=-\partial_{t}^{2}+\vec{\nabla}^{2}2=μμ=t2+2. So our patched up theory of gravitation could be written down as
(6.6) 2 Φ = 1 c 2 2 Φ t 2 + 2 Φ = 4 π G ρ (6.6) 2 Φ = 1 c 2 2 Φ t 2 + 2 Φ = 4 π G ρ {:(6.6)del^(2)Phi=-(1)/(c^(2))(del^(2)Phi)/(delt^(2))+ vec(grad)^(2)Phi=4pi G rho:}\begin{equation*} \partial^{2} \Phi=-\frac{1}{c^{2}} \frac{\partial^{2} \Phi}{\partial t^{2}}+\vec{\nabla}^{2} \Phi=4 \pi G \rho \tag{6.6} \end{equation*}(6.6)2Φ=1c22Φt2+2Φ=4πGρ
This equation looks like a worthy guess, but it fails at the first hurdle - general covariance! It doesn't work in different frames of reference. The source term ρ ρ rho\rhoρ is not a scalar and will transform when you go into different inertial frames. So this equation will not do, but it is heading in the right direction. 18 18 ^(18){ }^{18}18
The equivalence principle has told us that any physical law that can be expressed in special relativity without gravity will hold in a LIF, even when gravity is present. General covariance then tells us that the law holds in the same form in different coordinate system. We choose to express our laws using tensors and so a valid tensor equation 19 19 ^(19){ }^{19}19 compatible with special relativity also holds in a LIF even when gravity is present, and this same equation should take the same form in any coordinate system in the presence of gravity. In short: a valid tensor equation in the absence of gravity is a valid tensor equation in the presence of gravity. But something must change in the mathematics to herald the presence of gravitation! That something is geometry. The geometry of spacetime is altered by the presence of gravity, causing it to become curved. This is expressed in terms of the metric tensor. In the LIF of special relativity, the metric tensor, which can be thought of as a box of clocks and rulers that tells us how to measure vectors, is given by the Minkowski metric tensor η η eta\boldsymbol{\eta}η, whose components are 20 diag ( 1 , 1 , 1 , 1 ) are 20 diag ( 1 , 1 , 1 , 1 ) are^(20)diag(-1,1,1,1)\operatorname{are}^{20} \operatorname{diag}(-1,1,1,1)are20diag(1,1,1,1). However, in a general frame of reference, the metric is written as a tensor g g g\boldsymbol{g}g, whose components (i) are different to those of η η eta\boldsymbol{\eta}η and (ii) will vary in space and time. This provides our next lesson.
  • Lesson 4: The effect of gravitation on our tensor equations describing physical laws is to change the Minkowski metric η η eta\boldsymbol{\eta}η to a new metric g g g\boldsymbol{g}g.
Our strategy to find physical theories in the presence of gravitation will be to take a valid tensor equation that works in special relativity and upgrade it, simply by changing the Minkowski metric η η eta\boldsymbol{\eta}η to the general
metric tensor g g g\boldsymbol{g}g. The next step in our search for a theory of gravity is therefore to seek a theory that determines this metric g g g\boldsymbol{g}g in the presence of gravitation and also obeys the principles of equivalence and of general covariance.

6.3 A differential equation to describe

gravity
The metric is a rulebook that allows us access to the distances and angles between points in spacetime. For examples, the interval in spacetime along a curve x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ) from λ = a λ = a lambda=a\lambda=aλ=a to λ = b λ = b lambda=b\lambda=bλ=b may be worked out using the metric via the prescription
(6.7) s = a b d λ | g μ ν d x μ d λ d x ν d λ | 1 2 (6.7) s = a b d λ g μ ν d x μ d λ d x ν d λ 1 2 {:(6.7)s=int_(a)^(b)dlambda|g_(mu nu)(dx^(mu))/(dlambda)((d)x^(nu))/(dlambda)|^((1)/(2)):}\begin{equation*} s=\int_{a}^{b} \mathrm{~d} \lambda\left|g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \lambda}\right|^{\frac{1}{2}} \tag{6.7} \end{equation*}(6.7)s=ab dλ|gμνdxμdλ dxνdλ|12
In the absence of a gravitational field, we might evaluate this path length in any number of coordinate systems, but always obtain the same answer to questions such as the ratio of a circle's circumference to its diameter being π π pi\piπ, or the angles in a triangle adding up to π π pi\piπ. The only way in which this is possible is if g g g\boldsymbol{g}g at one point is related to g g g\boldsymbol{g}g at another point. This implies that the tensor g g g\boldsymbol{g}g, a function of position x , 21 x , 21 x,^(21)x,{ }^{21}x,21 should satisfy a differential equation. It is this differential equation, the equation that tells us how the metric g ( x ) g ( x ) g(x)\boldsymbol{g}(x)g(x) varies in spacetime, that allows us to separate the true effects of gravity, from those effects that result from a particular choice of coordinates. From this point of view, the metric is a field. We can think of a field 22 22 ^(22){ }^{22}22 as a machine into which we input a position in spacetime x μ = y μ x μ = y μ x^(mu)=y^(mu)x^{\mu}=y^{\mu}xμ=yμ. The field outputs the value of the tensor g ( y ) g ( y ) g(y)\boldsymbol{g}(y)g(y) appropriate for the point y y yyy. The field g g g\boldsymbol{g}g must obey a differential equation of motion that we'll call a field equation.
The equation in question arranges the components of g g g\boldsymbol{g}g and their derivatives into a new tensor that describes the curvature of spacetime 23 23 ^(23){ }^{23}23 called the Riemann tensor R R R\boldsymbol{R}R. It is curvature that is the true effect of gravitating mass and which can never simply be the results of having chosen a perverse set of coordinates. It will transpire that non-zero components of R R R\boldsymbol{R}R tell us about curvature, and curvature means gravitation.
Despite these pointers, we still have relatively little guidance on how to put together the field theory of gravitation. However, there is one other clue: the theory must be compatible with (i) Newtonian gravitation and (ii) with special relativity. That is to say that, in the limit of weak gravitational fields and low velocities, the predictions of the field theory must recreate those of Newton's universal theory of gravitation. In the limit of vanishing gravitational field, the theory must agree with special relativity. Conceptually, therefore, the field theory of gravity (that is, general relativity) fits into the scheme shown in Fig. 6.4. We can summarize this section as follows:
Fig. 6.4 The relationship between general relativity, special relativity, Newtonian gravity and Newtonian mechanics as a function of velocity v v vvv and gravitational constant G G GGG.
24 24 ^(24){ }^{24}24 The observer will need to set up their local frame to have the Lorentz siglocal frame to have the Lorentz sig-
nature [i.e. the ( 1 , 1 , 1 , 1 ) ( 1 , 1 , 1 , 1 ) (-1,1,1,1)(-1,1,1,1)(1,1,1,1) pattern of nature [i.e. the ( 1 , 1 , 1 , 1 ) ( 1 , 1 , 1 , 1 ) (-1,1,1,1)(-1,1,1,1)(1,1,1,1) pattern of
signs on the diagonal of the Minkowski metric.]
25 25 ^(25){ }^{25}25 There is a distinction between a frame of reference and a set of coordinates. A frame of reference is defined by some basis vectors and so has an existence independent of a coordinate system.
  • A general LIF will be covered in coordinates for which g μ ν ( x = x α ( P ) ) = g μ ν x = x α ( P ) = g_(mu nu)(x=x^(alpha)(P))=g_{\mu \nu}\left(x=x^{\alpha}(\mathcal{P})\right)=gμν(x=xα(P))= η μ ν η μ ν eta_(mu nu)\eta_{\mu \nu}ημν and g μ ν / x α | x = x α ( P ) = 0 g μ ν / x α x = x α ( P ) = 0 delg_(mu nu)// delx^(alpha)|_(x=x^(alpha)(P))=0\partial g_{\mu \nu} /\left.\partial x^{\alpha}\right|_{x=x^{\alpha}(\mathcal{P})}=0gμν/xα|x=xα(P)=0 in an infinitesimal region around the point ( t ( P ) , x ( P ) , y ( P ) , z ( P ) ) ( t ( P ) , x ( P ) , y ( P ) , z ( P ) ) (t(P),x(P),y(P),z(P))(t(\mathcal{P}), x(\mathcal{P}), y(\mathcal{P}), z(\mathcal{P}))(t(P),x(P),y(P),z(P)). This can be achieved using Riemann normal coordinates, described in Chapter 35. - The freely falling frame has its time direction e t e t e_(t)e_{t}et tangent to a geodesic. It direction e t e t e_(t)e_{t}et tangent to a geodesic. It
    remains freely falling as a function of remains freely falling as a function of
    the local (proper) time so continues to the local (proper) time so continues to
    be a LIF for the whole time it is falling. be a LIF for the whole time it is falling.
    In terms of coordinates, it is only flat in In terms of coordinates, it is only flat in
    a very small spatial region around the a very small spatial region around the
    origin of the frame, but for long intervals of proper time.
  • Lesson 5: General relativity provides a field theory of the metric field g ( x ) g ( x ) g(x)\boldsymbol{g}(x)g(x) which encodes gravitation through its effect in providing a curvature to spacetime.

6.4 Local flatness

The Minkowski metric can be used by any observer who is not subject to a gravitational field. Such a region of spacetime has no curvature and so Minkowski spacetime is flat. By the principle of equivalence, a freely falling observer does not feel a gravitational field providing they make measurements over a small enough region of spacetime. This implies that locally, all observers can treat spacetime as flat if they use a LIF. This is the content of the local flatness theorem.
It is always possible to reduce a metric field g ( x ) g ( x ) g(x)\boldsymbol{g}(x)g(x), evaluated at a single point x = P x = P x=Px=\mathcal{P}x=P, to the Minkowski metric η η eta\boldsymbol{\eta}η. That is, we can introduce coordinates x α ( P ) x α ( P ) x^(alpha)(P)x^{\alpha}(\mathcal{P})xα(P) such that the components of the tensors obey the equation
(6.8) g μ ν ( x = x α ( P ) ) = η μ ν . (6.8) g μ ν x = x α ( P ) = η μ ν . {:(6.8)g_(mu nu)(x=x^(alpha)(P))=eta_(mu nu).:}\begin{equation*} g_{\mu \nu}\left(x=x^{\alpha}(\mathcal{P})\right)=\eta_{\mu \nu} . \tag{6.8} \end{equation*}(6.8)gμν(x=xα(P))=ημν.
This is possible since g g g\boldsymbol{g}g is represented by a symmetric 4 × 4 4 × 4 4xx44 \times 44×4 matrix which can always be diagonalized. This implies that there are potentially lots of local frames (i.e. not just the special freely falling LIF we have described so far) that appear flat at a single point and, in these frames, the observer uses the Minkowski metric of flat spacetime to manipulate vectors. 24 24 ^(24){ }^{24}24 The point of the local flatness theorem is that it is also possible to find coordinates such that the derivatives of the metric vanish at this point
(6.9) g μ ν x α | x = x α ( P ) = 0 (6.9) g μ ν x α x = x α ( P ) = 0 {:(6.9)(delg_(mu nu))/(delx^(alpha))|_(x=x^(alpha)(P))=0:}\begin{equation*} \left.\frac{\partial g_{\mu \nu}}{\partial x^{\alpha}}\right|_{x=x^{\alpha}(\mathcal{P})}=0 \tag{6.9} \end{equation*}(6.9)gμνxα|x=xα(P)=0
This means that spacetime will be described by the Minkowski metric in an infinitesimal region around x = x α ( P ) x = x α ( P ) x=x^(alpha)(P)x=x^{\alpha}(\mathcal{P})x=xα(P), making the notion of local flatness mathematically respectable. A local inertial frame (LIF) is a frame where this requirement is satisfied at some point x = P x = P x=Px=\mathcal{P}x=P. Note that in the presence of gravity it is not possible to find a vanishing second derivative of g g g\boldsymbol{g}g, since second derivatives will turn out to be related to spacetime curvature. 25 25 ^(25){ }^{25}25
The freely falling frame of reference (used by the falling observer) that we have described in this chapter is an important example of a LIF. LIFs are very useful in that (i) as we've said, the laws of physics are identical in LIFs and in general frames, subject to the change η g η g eta rarr g\boldsymbol{\eta} \rightarrow \boldsymbol{g}ηg; and (ii) analysing physics in LIFs is invariably far easier than doing so in curved spacetime.
We shall also use the idea that we can straightforwardly identify frames in which g = η g = η g=eta\boldsymbol{g}=\boldsymbol{\eta}g=η at the origin. By design, an observer erects an orthogonal set of basis vectors where they are situated, and normalizes this basis. The reason we're interested in these local orthonormal frames is that observations and measurements can be thought of as being made in them. This provides the final lesson in this chapter:
  • Lesson 6: An observation is made and interpreted by an observer in a local orthonormal frame, who uses the Minkowski tensor at the point they inhabit in spacetime.
The picture to have in mind is of the observer in a laboratory using their set of orthonormal axes, constructed from short, rigid rods, as an instrument to interpret the components of vectors locally.

6.5 Time dilation in a gravitational field

Our discussion so far has been very general and, consequently, a little abstract. We have yet to see how the curvature of spacetime that encodes gravity via the metric g g g\boldsymbol{g}g has any effect beyond the possibility of Newtonian-style gravitational attraction. To give an idea of how g g g\boldsymbol{g}g affects measurements we shall conclude this chapter by illustrating the influence of the metric on measurements made by two distant observers in an inhomogeneous gravitational field. Here gravitation leads to time dilation and a gravitational shift of the frequencies of light signals.

Example 6.8

Consider a clock at rest in some coordinate system. The clock ticks (which we will take to be infinitesimally separated by a coordinate interval d t d t dt\mathrm{d} tdt ) will then be separated by the proper time interval d τ d τ dtau\mathrm{d} \taudτ where
(6.10) d τ 2 = g α β d x α d x β = g 00 d t 2 (6.10) d τ 2 = g α β d x α d x β = g 00 d t 2 {:(6.10)-dtau^(2)=g_(alpha beta)dx^(alpha)dx^(beta)=g_(00)dt^(2):}\begin{equation*} -\mathrm{d} \tau^{2}=g_{\alpha \beta} \mathrm{d} x^{\alpha} \mathrm{d} x^{\beta}=g_{00} \mathrm{~d} t^{2} \tag{6.10} \end{equation*}(6.10)dτ2=gαβdxαdxβ=g00 dt2
where we have upgraded the flat metric η α β η α β eta_(alpha beta)\eta_{\alpha \beta}ηαβ to the curved-space metric g α β g α β g_(alpha beta)g_{\alpha \beta}gαβ. If the clock were sitting 'at infinity', well away from any sources of gravitational field, it would indeed be in flat space, so g 00 = η 00 = 1 g 00 = η 00 = 1 g_(00)=eta_(00)=-1g_{00}=\eta_{00}=-1g00=η00=1 and d t = d τ d t = d τ dt=dtau\mathrm{d} t=\mathrm{d} \taudt=dτ. However, if the clock was a distance r r rrr from a star of mass M M MMM then we could use the metric of eqn 5.22 (assuming the Newtonian limit holds) and hence g 00 = ( 1 2 G M / r ) g 00 = ( 1 2 G M / r ) g_(00)=-(1-2GM//r)g_{00}=-(1-2 G M / r)g00=(12GM/r). In this case, 26 26 ^(26){ }^{26}26
(6.11) d τ = ( g 00 ) 1 / 2 d t = ( 1 2 G M r ) 1 / 2 d t (6.11) d τ = g 00 1 / 2 d t = 1 2 G M r 1 / 2 d t {:(6.11)dtau=(-g_(00))^(1//2)dt=(1-(2GM)/(r))^(1//2)dt:}\begin{equation*} \mathrm{d} \tau=\left(-g_{00}\right)^{1 / 2} \mathrm{~d} t=\left(1-\frac{2 G M}{r}\right)^{1 / 2} \mathrm{~d} t \tag{6.11} \end{equation*}(6.11)dτ=(g00)1/2 dt=(12GMr)1/2 dt
That is say that the interval d τ d τ dtau\mathrm{d} \taudτ between ticks of a clock measured in a particular frame depends on the details of the metric and hence, on the gravitational field. Since ( 1 2 G M / r ) 1 / 2 < 1 ( 1 2 G M / r ) 1 / 2 < 1 (1-2GM//r)^(1//2) < 1(1-2 G M / r)^{1 / 2}<1(12GM/r)1/2<1, we have d τ < d t d τ < d t dtau < dt\mathrm{d} \tau<\mathrm{d} tdτ<dt and so this is gravitational time dilation. Note that the factor 2 G M / r 2 G M / r 2GM//r2 G M / r2GM/r that enters this expression is just the square of the classical escape velocity v esc v esc  v_("esc ")v_{\text {esc }}vesc  at distance r r rrr from the star, which appears when you equate the kinetic energy 1 2 m v esc 2 1 2 m v esc  2 (1)/(2)mv_("esc ")^(2)\frac{1}{2} m v_{\text {esc }}^{2}12mvesc 2 to the gravitational potential energy G M m / r G M m / r GMm//rG M m / rGMm/r for a test mass m m mmm. Thus we could write eqn 6.11 as d τ = d t 1 ( v esc / c ) 2 d τ = d t 1 v esc  / c 2 dtau=dtsqrt(1-(v_("esc ")//c)^(2))\mathrm{d} \tau=\mathrm{d} t \sqrt{1-\left(v_{\text {esc }} / c\right)^{2}}dτ=dt1(vesc /c)2. If instead we wish to write this expression in terms of the Schwarzschild radius 27 r S = 2 G M / c 2 27 r S = 2 G M / c 2 ^(27)r_(S)=2GM//c^(2){ }^{27} r_{\mathrm{S}}=2 G M / c^{2}27rS=2GM/c2, then we could write it as
(6.12) d τ = d t 1 r S r (6.12) d τ = d t 1 r S r {:(6.12)dtau=dtsqrt(1-(r_(S))/(r)):}\begin{equation*} \mathrm{d} \tau=\mathrm{d} t \sqrt{1-\frac{r_{\mathrm{S}}}{r}} \tag{6.12} \end{equation*}(6.12)dτ=dt1rSr
26 26 ^(26){ }^{26}26 Remember that we are using units for which c = 1 c = 1 c=1c=1c=1. The gravitational timedilation factor is [ 1 2 G M / ( c 2 r ) ] 1 / 2 1 2 G M / c 2 r 1 / 2 [1-2GM//(c^(2)r)]^(1//2)\left[1-2 G M /\left(c^{2} r\right)\right]^{1 / 2}[12GM/(c2r)]1/2 if you put the factors of c c ccc back in.
27 27 ^(27){ }^{27}27 This quantity will be discussed in detail in Part IV of the book.
Fig. 6.5 Identical clocks 1 and 2 are held fixed, a long way from the star and at distance r r rrr respectively. (a) A third clock is released when it is next to clock 1. (b) Clock 3 is travelling at speed v v vvv by the time it ends up next to clock 2. The diagram is schematic, so all three clocks and the star should be in a straight line (and of course the star will be much bigger!).
Let's derive eqn 6.11 a different way. Consider two identical clocks, the first held well away from the star and a second at a distance r r rrr from it (see Fig. 6.5). Take a third identical clock, to measure the first two, and release it at the position of clock 1 [see Fig. 6.5(a)]. Clock 3 is in free-fall in the gravitational field of the star, and so the interval between its clicks can be taken to be d τ d τ dtau\mathrm{d} \taudτ in its LIF. Immediately after releasing clock 3 , we find that clock 1 and clock 3 agree (their ticks are in sync) because clock 3 is initially barely moving and so no relativistic corrections are needed (just as we found above: d t = d τ d t = d τ dt=dtau\mathrm{d} t=\mathrm{d} \taudt=dτ ). However, clock 3 starts to accelerate towards the star as it drawn inexorably towards it and by the time it reaches clock 2 it will be moving much faster, let's say at a speed v v vvv [see Fig. 6.5(b)]. We are in the Newtonian limit so its kinetic energy 1 2 m v 2 1 2 m v 2 (1)/(2)mv^(2)\frac{1}{2} m v^{2}12mv2 has been obtained from releasing gravitational potential energy G M m / r G M m / r GMm//rG M m / rGMm/r, implying that v 2 = 2 G M / r v 2 = 2 G M / r v^(2)=2GM//rv^{2}=2 G M / rv2=2GM/r. Because clock 3 is instantaneously in free fall, gravity is absent in its inertial reference frame and so it has the same time interval between clicks d τ d τ dtau\mathrm{d} \taudτ as it had previously. However, the interval between ticks for clock 2 will be d t = γ d τ d t = γ d τ dt=gammadtau\mathrm{d} t=\gamma \mathrm{d} \taudt=γdτ where γ = ( 1 v 2 ) 1 / 2 γ = 1 v 2 1 / 2 gamma=(1-v^(2))^(-1//2)\gamma=\left(1-v^{2}\right)^{-1 / 2}γ=(1v2)1/2, so that once again
(6.13) d τ = ( 1 2 G M r ) 1 / 2 d t (6.13) d τ = 1 2 G M r 1 / 2 d t {:(6.13)dtau=(1-(2GM)/(r))^(1//2)dt:}\begin{equation*} \mathrm{d} \tau=\left(1-\frac{2 G M}{r}\right)^{1 / 2} \mathrm{~d} t \tag{6.13} \end{equation*}(6.13)dτ=(12GMr)1/2 dt
We have centred our discussion around clocks, but we could have framed the argument around atoms emitting light of a well-defined frequency due to some atomic transition. In this case, the frequency of the detected light from an atom in a gravitational field is found to be lower than that from the process occurring at infinity. In terms of wavelength, the light has been shifted towards the red end of the spectrum and, as a result, we call the effect gravitational redshift.
Bear in mind though that the calculation in this example was carried out in the Newtonian limit, meaning that G M / R c 2 1 G M / R c 2 1 GM//Rc^(2)≪1G M / R c^{2} \ll 1GM/Rc21, and so one needs to check that this limit holds before using eqn 6.11 to perform a calculation. However, if this limit does hold we can simplify eqn 6.11 and write the time dilation factor as 1 G M / R 1 G M / R 1-GM//R1-G M / R1GM/R using the binomial theorem to expand the square root.
In Exercise 6.1, you can put numbers into these formulae, but suffice to say that the effect of gravitational redshift between an observer on the Earth's surface compared to one in deep space is negligible. For an observer on the surface of a neutron star (whose radius might only be 10 km , but the mass could be something like 1.4 M 1.4 M 1.4M_(o.)1.4 M_{\odot}1.4M ) the effect is extremely significant. However, even though the effect on Earth is extremely tiny, it is needed to take into account for the proper working of the satellite navigation methods based on the Global Positioning System (GPS). This relies on accurate timing of signals coming from a network of satellites and received by an observer who wants to know where on the Earth's surface she is. Relativistic effects need to be taken into account for this to be accurate, first because the satellites are in motion with respect to the ground based observer (special relativity correc tion) and second because the satellites experience a lower gravitational field than the ground-based observer (general relativity correction due to the gravitational redshift).
This is the second form of redshift we have encountered. The first (seen in Exercise 4.8) was due to the Doppler effect in flat spacetime. We will encounter a third form of redshift when we discuss cosmology; that form results from the expansion of spacetime itself over very large distances. Gravitational and cosmological redshift are effects due to the change in metric in spacetime, which is different to the special relativistic Doppler effect which is due to the velocity of sources and observers.
In the next chapter, we will continue to explore the properties of curved spacetime and find out how to define a derivative of a vector in a generally covariant way. As we have learnt in this chapter, it is only generally covariant quantities that will be admissible in any theory that aspires to describe the physical universe.

Chapter summary

  • The principle of equivalence tells us that in every local inertial frame all non-gravitational laws of physics must take on their special relativistic forms.
  • The principle of general covariance tells us that laws must be preserved in different coordinate systems.
  • We have used these principles to describe time dilation in a gravitational field.

Exercises

(6.1) Estimate the gravitational redshift (the factor by which a clock in a gravitational field runs slow compared to one subject to zero gravitational field) for the following cases: (a) a clock on the surface of the Earth; (b) a clock on the surface of the Sun; (c) a clock on the surface of a solar mass white dwarf with radius 10 3 km 10 3 km 10^(3)km10^{3} \mathrm{~km}103 km.
(6.2) A recent experiment uses clouds of 87 Sr 87 Sr ^(87)Sr{ }^{87} \mathrm{Sr}87Sr atoms at around 100 nK , loaded into an optical lattice and operated as a sophisticated atomic clock [T. Bothwell et al., Nature 602, 420 (2022)]. It is possible to measure the gravitational redshift across the millimetre scale of this system, and the laboratory experiment gives a value of the frequency gradient of around 1.0 ( 2 ) × 10 19 mm 1 1.0 ( 2 ) × 10 19 mm 1 -1.0(2)xx10^(-19)mm^(-1)-1.0(2) \times 10^{-19} \mathrm{~mm}^{-1}1.0(2)×1019 mm1. Is this consistent with what you would expect from general relativity?
(6.3) A satellite is in a circular orbit of radius r r rrr around a planet of radius R R RRR and mass m m mmm. Show that a clock on the satellite runs faster than a clock on the surface of the planet, located at one of the poles, by a factor of approximately 1 + G M / c 2 [ 1 / R 3 / ( 2 r ) ] 1 + G M / c 2 [ 1 / R 3 / ( 2 r ) ] 1+GM//c^(2)[1//R-3//(2r)]1+G M / c^{2}[1 / R-3 /(2 r)]1+GM/c2[1/R3/(2r)]. Hence show that there is one possible orbit radius for which the two clocks run at the same rate. Hint: You not only need the gravitational time dilation but also the effect due to the satellite moving (i.e. the special relativity time dilation), which is (at least instantaneously) in a straight line.
Estimate the factor for a geostationary satellite orbiting around the Earth.
(6.4) Consider the Schwarzschild metric line element which describes the spacetime around spherically
symmetric stars (here we have taken G = c = 1 G = c = 1 G=c=1G=c=1G=c=1 )
d s 2 = ( 1 2 M r ) d t 2 + ( 1 2 M r ) 1 d r 2 + r 2 ( d θ 2 + sin 2 θ d ϕ 2 ) . d s 2 = 1 2 M r d t 2 + 1 2 M r 1 d r 2 + r 2 d θ 2 + sin 2 θ d ϕ 2 . {:[ds^(2)=-(1-(2M)/(r))dt^(2)+(1-(2M)/(r))^(-1)dr^(2)],[+r^(2)((d)theta^(2)+sin^(2)theta(d)phi^(2)).]:}\begin{aligned} \mathrm{d} s^{2}= & -\left(1-\frac{2 M}{r}\right) \mathrm{d} t^{2}+\left(1-\frac{2 M}{r}\right)^{-1} \mathrm{~d} r^{2} \\ & +r^{2}\left(\mathrm{~d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right) . \end{aligned}ds2=(12Mr)dt2+(12Mr)1 dr2+r2( dθ2+sin2θ dϕ2).
(a) What is the proper time interval, measured by an observer at rest, between events at coordinate time t t ttt and t + d t t + d t t+dtt+\mathrm{d} tt+dt that both occur at a point ( r , θ , ϕ ) ( r , θ , ϕ ) (r,theta,phi)(r, \theta, \phi)(r,θ,ϕ) ? (b) Now consider two observers at rest in this spacetime. An atom undergoes an atomic transition at position ( r 2 , θ , ϕ ) r 2 , θ , ϕ (r_(2),theta,phi)\left(r_{2}, \theta, \phi\right)(r2,θ,ϕ). What is the time interval between two successive wavefronts measured at point ( r 2 , θ , ϕ ) r 2 , θ , ϕ (r_(2),theta,phi)\left(r_{2}, \theta, \phi\right)(r2,θ,ϕ) ?
(c) What is the interval between wavefronts measured at r 2 r 2 r_(2)r_{2}r2 from the experiment that takes place at r 1 r 1 r_(1)r_{1}r1 ?
Fig. 6.6 Light signal send from A A AAA to B B BBB and back to A A AAA (Exercises 6.5 and 6.6).
(6.5) Consider a measurement of length involving a light signal being sent from point A A AAA to B B BBB and then back to A A AAA, as shown in Fig. 6.6. Multiplying c c ccc by the time that the observer at A A AAA measures for this process gives twice the distance between points.
(a) By considering the interval of coordinate time that elapses for a signal sent between A A AAA and B B BBB, show that we obtain
d t = 1 g 00 { g 0 i d x i ± [ ( g 0 i g 0 j g i j g 00 ) d x i d x j ] 1 2 } d t = 1 g 00 g 0 i d x i ± g 0 i g 0 j g i j g 00 d x i d x j 1 2 dt=(1)/(g_(00)){-g_(0i)(d)x^(i)+-[(g_(0i)g_(0j)-g_(ij)g_(00))dx^(i)(d)x^(j)]^((1)/(2))}\mathrm{d} t=\frac{1}{g_{00}}\left\{-g_{0 i} \mathrm{~d} x^{i} \pm\left[\left(g_{0 i} g_{0 j}-g_{i j} g_{00}\right) \mathrm{d} x^{i} \mathrm{~d} x^{j}\right]^{\frac{1}{2}}\right\}dt=1g00{g0i dxi±[(g0ig0jgijg00)dxi dxj]12}.
(b) What do the two roots correspond to?
(c) Show that the corresponding proper time interval measured by the observer at A A AAA for the signal to be sent and received back is
(6.16) d τ = 2 g 00 [ ( g 0 i g 0 j g i j g 00 ) d x i d x j ] 1 2 (6.16) d τ = 2 g 00 g 0 i g 0 j g i j g 00 d x i d x j 1 2 {:(6.16)dtau=-(2)/(g_(00))[(g_(0i)g_(0j)-g_(ij)g_(00))dx^(i)(d)x^(j)]^((1)/(2)):}\begin{equation*} \mathrm{d} \tau=-\frac{2}{g_{00}}\left[\left(g_{0 i} g_{0 j}-g_{i j} g_{00}\right) \mathrm{d} x^{i} \mathrm{~d} x^{j}\right]^{\frac{1}{2}} \tag{6.16} \end{equation*}(6.16)dτ=2g00[(g0ig0jgijg00)dxi dxj]12
(d) Show that this leads to a measured length interval of
(6.17) d l 2 = ( g i j g 0 i g 0 j g 00 ) d x i d x j (6.17) d l 2 = g i j g 0 i g 0 j g 00 d x i d x j {:(6.17)dl^(2)=(g_(ij)-(g_(0i)g_(0j))/(g_(00)))dx^(i)dx^(j):}\begin{equation*} \mathrm{d} l^{2}=\left(g_{i j}-\frac{g_{0 i} g_{0 j}}{g_{00}}\right) \mathrm{d} x^{i} \mathrm{~d} x^{j} \tag{6.17} \end{equation*}(6.17)dl2=(gijg0ig0jg00)dxi dxj
If the g i j g i j g_(ij)g_{i j}gij depend on x 0 x 0 x^(0)x^{0}x0 so that the spatial components are time dependent, it would not make sense to integrate this expression to obtain a general expression for proper time, since the integral would depend on the world line between the two points in space.
(6.6) Consider again the set up in Exercise 6.5 with light signals sent between A A AAA and B B BBB and call the time on
B B BBB 's world line when the light signal is received x 0 x 0 x^(0)x^{0}x0. We define the time on A A AAA 's world line that is simultaneous to this to be half way between emission and reception of the light signals.
(a) Show that this time is given by
(6.18) x 0 g 0 i g 00 d x i (6.18) x 0 g 0 i g 00 d x i {:(6.18)x^(0)-int(g_(0i))/(g_(00))*dx^(i):}\begin{equation*} x^{0}-\int \frac{g_{0 i}}{g_{00}} \cdot \mathrm{~d} x^{i} \tag{6.18} \end{equation*}(6.18)x0g0ig00 dxi
Attempting to use this formula to synchronize clocks on a closed path, such as a rotating disc, will fail, since the integral will not vanish.
(b) Using the metric for the rotating reference frame from Exercise 3.5, show that the discrepancy over one circuit is
(6.19) Δ t = Ω r 2 1 Ω 2 r 2 2 π Ω r 2 (6.19) Δ t = Ω r 2 1 Ω 2 r 2 2 π Ω r 2 {:(6.19)Delta t=oint(Omegar^(2))/(1-Omega^(2)r^(2))~~2pi Omegar^(2):}\begin{equation*} \Delta t=\oint \frac{\Omega r^{2}}{1-\Omega^{2} r^{2}} \approx 2 \pi \Omega r^{2} \tag{6.19} \end{equation*}(6.19)Δt=Ωr21Ω2r22πΩr2
when Ω r 1 Ω r 1 Omega r≪1\Omega r \ll 1Ωr1. To the same level of approximation, the discrepancy in proper time is Δ τ = g t t Δ t Δ τ = g t t Δ t Delta tau=sqrt(-g_(tt))Delta t~~\Delta \tau=\sqrt{-g_{t t}} \Delta t \approxΔτ=gttΔt Δ t Δ t Delta t\Delta tΔt.
(c) By comparing the optical path lengths of two counter-propagating beams along a rotating circular fibre, show that the rotation causes a shift in their interference pattern of
(6.20) Δ N = 4 π Ω r 2 λ (6.20) Δ N = 4 π Ω r 2 λ {:(6.20)Delta N=(4pi Omegar^(2))/(lambda):}\begin{equation*} \Delta N=\frac{4 \pi \Omega r^{2}}{\lambda} \tag{6.20} \end{equation*}(6.20)ΔN=4πΩr2λ
where λ λ lambda\lambdaλ is the wavelength of the light. This shift is known as the Sagnac effect after George Sagnac (1869-1928).

Parallel lines and the covariant derivative

We never remark any passion or principle in others, of which, in some degree or other, we may not find a parallel in ourselves.
David Hume (1711-1776) A Treatise of Human Nature
Comparisons are odorous
William Shakespeare (1564-1616)
Much Ado About Nothing III:5

7.1 Parallelism
In order to describe physical quantities in general relativity, we shall need to evaluate mathematical objects (functions, vector, and tensor fields and so forth) at particular points in spacetime. We shall also identify differential equations for these objects to understand how they change with position in spacetime. This requires the notion of a derivative. When spacetime is curved, some of our basic assumptions about vectors and their derivatives break down. This chapter is concerned with finding a method to take derivatives of vectors with respect to position in spacetime in cases where the spacetime is curved.

Example 7.1

Probably the most familiar example of a curved space is the one in which we live: the Earth's surface. In navigating around our home town, we might use a street map, the coordinates for which are based on a two-dimensional rectangular grid based on north-south and east-west axes. But this street map only works locally and can't be extended to the whole planet because of the curvature of the Earth. Nevertheless, we can imagine smoothly transitioning between lots of small rectangular maps to cover the whole surface of the globe.
In much the same way, ( 3 + 1 ) ( 3 + 1 ) (3+1)(3+1)(3+1)-dimensional spacetime can be covered smoothly using lots of locally flat maps based on four coordinates. A spacetime that can be covered by a smoothly changing set of coordinates is known in mathematics as a manifold. 1 1 ^(1){ }^{1}1 The study of smoothly changing spaces is known as differential geometry and is the subject of Part V of this book. Owing to its smoothness, a manifold describing spacetime is necessarily flat over a sufficiently small region, across which it looks identical to the Minkowski spacetime encountered in the first part of this book. Over larger distances however, the spacetime might be curved, and it is this curvature that we describe over the next few chapters, with the derivative formulated in this chapter an essential first step.
2 2 ^(2){ }^{2}2 For example, if we think of the surface of the Earth then a straight-line path from New York to Paris would plough through the Earth, burrowing hundreds of kilometres underground. This directed straight-line path is then outside the space we are trying to describe.
Fig. 7.1 A tangent vector t t t\boldsymbol{t}t is parallel transported around a surface. Its components are always the same in the local coordinate system, but t t ttt changes when viewed in the ( X , Z ) ( X , Z ) (X,Z)(X, Z)(X,Z) coordinate system set up by observers able to embed the space in higher dimensions.
3 3 ^(3){ }^{3}3 Imagine a tourist using a street map in London. In absolute terms, their North-direction is rather different to the North-direction of an analogous tourist in Tokyo, even though it is analogously defined in both city street maps as a particular vector tangent to the Earth. The North-direction could be Earth. The North-direction could be
thought of as being parallel transported thought of as being para
between the two cities.
Fig. 7.2 Failure of parallelism for sphere embedded in R 3 R 3 R^(3)\mathbb{R}^{3}R3. The vectors drawn on the surface of the sphere all point in the same absolute direction, according to the embedding in R 3 R 3 R^(3)\mathbb{R}^{3}R3, but they only lie in the tangent plane at one point; more often, they are pointing out of the tangent plane.

7.1 Parallelism

Consider a curved spacetime where observers are confined. The picture to have in mind is of ants confined to a two-dimensional surface such as a football, or of humans confined to the surface of the earth. All measurements are to be made in the surface, so the observers are not allowed to float above it to take advantage of its being embedded in three-dimensional space. The idea of a vector as a directed straight line joining two points is fine for flat space, but ceases to be of much use in a curved space. 2 2 ^(2){ }^{2}2 The notion of a vector as a tangent to a path is, however, of much more use. Picture how the tangent vector to a path on a two-dimensional surface embedded in three-dimensional space will change its direction as we move it around the curved space, in order for it to still lie in the tangent plane of the surface. This behaviour of the tangent vector will provide a measure of vectors being parallel.
Next, we imagine that the path in the surface is one that doesn't change direction according to the observers (e.g. a great circle on a sphere, as shown in Fig. 7.1). The tangent vector of this path should, according to the trapped ants, be 'the same', or parallel, at all points on the path. A tangent vector to the path at some point may then be transported to a different point on the path and, if it is identical to the tangent vector determined at its new position, then we say that the vector has been parallel transported (Fig. 7.1). From the point of view of the observers confined to the surface, two vectors can then be compared at two different points in spacetime.
To generalize beyond tangent vectors: our trapped ants set up a coordinate system with which to make measurements. Measurements are always made locally, so they parallel transport their set of axes to the point where they want to measure the orientation of a vector. (The local coordinate system might, for example, use the tangent vector described above, and another axis in the surface orthogonal to this direction.) From this point of view, any vector that has the same components in each of the local coordinate systems is judged to be parallel at the different points. However, these vectors appear to change directions when we view the surface as embedded in a higher dimensional space, just as the coordinate system used by the ants on the surfaces appears to change with position (Fig. 7.1). 3 3 ^(3){ }^{3}3 This concept of parallelism will be included when we describe derivatives in curved spacetime, since the derivative of a field of parallel vectors should come out to be zero.

Example 7.2

The vectors in Fig. 7.2 are all parallel in three-dimensional space R 3 R 3 R^(3)\mathbb{R}^{3}R3. However, for observers living on the surface of the sphere, the vectors are not parallel: they all make different angles to the tangent plane of the sphere's surface. The result of correctly parallel transporting a vector along several paths on a spherical surface is shown in Fig. 7.3. The components of the vector are identical in each of the local coordinate systems that the confined observers set up.

7.2 Derivatives and connections

We now turn to a method to evaluate the change in a vector with position. In order to evaluate, via a derivative, the change in a vector as it is transported around, we need to disentangle two effects. The first is the intrinsic change in the vector with position, which is what we want the derivative to output. The second is the change in the vector reflecting the fact that the coordinates (or, equivalently, the basis vectors) change in different parts of space. To extract the intrinsic change we define a new kind of derivative of the vector. This is the covariant derivative which evaluates the intrinsic change in the vector v v v\boldsymbol{v}v. We can make sense of the covariant derivative conceptually as
( Covariant derivative of v ) u = ( Change in vector v ) ( Change due to coordinate system ) (  Covariant   derivative of  v ) u = (  Change in   vector  v ) (  Change due to   coordinate system  ) ((" Covariant ")/(" derivative of "v))_(u)=((" Change in ")/(" vector "v))-((" Change due to ")/(" coordinate system "))\binom{\text { Covariant }}{\text { derivative of } \boldsymbol{v}}_{u}=\binom{\text { Change in }}{\text { vector } \boldsymbol{v}}-\binom{\text { Change due to }}{\text { coordinate system }}( Covariant  derivative of v)u=( Change in  vector v)( Change due to  coordinate system ).
This derivative is directional: it tells us the change in v v v\boldsymbol{v}v as we move along the vector u u u\boldsymbol{u}u (hence the subscript in the previous equation).

Example 7.3

The covariant derivative in a curved space generalizes the notion of a directional derivative in ordinary calculus. The gradient of a surface of constant f f fff in Euclidean 3 -space is given by
(7.2) f ( x , y , z ) = f x e x + f y e y + f z e z . (7.2) f ( x , y , z ) = f x e x + f y e y + f z e z . {:(7.2) vec(grad)f(x","y","z)=(del f)/(del x) vec(e)_(x)+(del f)/(del y)* vec(e)_(y)+(del f)/(del z) vec(e)_(z).:}\begin{equation*} \vec{\nabla} f(x, y, z)=\frac{\partial f}{\partial x} \vec{e}_{x}+\frac{\partial f}{\partial y} \cdot \vec{e}_{y}+\frac{\partial f}{\partial z} \vec{e}_{z} . \tag{7.2} \end{equation*}(7.2)f(x,y,z)=fxex+fyey+fzez.
This is interpreted as a vector normal to the tangent plane of the surface of constant f f fff. If we want to know the change of f ( x , y , z ) f ( x , y , z ) f(x,y,z)f(x, y, z)f(x,y,z) along a particular vector u u vec(u)\vec{u}u we use the directional derivative, defined as
(7.3) u f = u x f x + u y f y + u z f z (7.3) u f = u x f x + u y f y + u z f z {:(7.3) vec(u)* vec(grad)f=u^(x)(del f)/(del x)+u^(y)(del f)/(del y)+u^(z)(del f)/(del z):}\begin{equation*} \vec{u} \cdot \vec{\nabla} f=u^{x} \frac{\partial f}{\partial x}+u^{y} \frac{\partial f}{\partial y}+u^{z} \frac{\partial f}{\partial z} \tag{7.3} \end{equation*}(7.3)uf=uxfx+uyfy+uzfz
This can be thought of as
(7.4) u f = ( Value of f at tip of u ) ( Value of f at base of u ) (7.4) u f = (  Value of  f  at tip of  u ) (  Value of  f  at base of  u ) {:(7.4) vec(u)* vec(grad)f=((" Value of "f)/(" at tip of "( vec(u))))-((" Value of "f)/(" at base of "( vec(u)))):}\begin{equation*} \vec{u} \cdot \vec{\nabla} f=\binom{\text { Value of } f}{\text { at tip of } \vec{u}}-\binom{\text { Value of } f}{\text { at base of } \vec{u}} \tag{7.4} \end{equation*}(7.4)uf=( Value of f at tip of u)( Value of f at base of u)
Now to take the derivative. Consider the vector v = v μ e μ v = v μ e μ v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}v=vμeμ. We take a derivative with respect to the coordinates, allowing both the components v μ v μ v^(mu)v^{\mu}vμ and the basis vectors e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ to change in spacetime. The derivative we shall take is / x α / x α del//delx^(alpha)\partial / \partial x^{\alpha}/xα, which should be thought of as the directional derivative along the direction e α e α e_(alpha)\boldsymbol{e}_{\alpha}eα. Employing the Leibniz product rule, we have
(7.5) v x α = v μ x α e μ + v μ e μ x α (7.5) v x α = v μ x α e μ + v μ e μ x α {:(7.5)(del v)/(delx^(alpha))=(delv^(mu))/(delx^(alpha))e_(mu)+v^(mu)(dele_(mu))/(delx^(alpha)):}\begin{equation*} \frac{\partial v}{\partial x^{\alpha}}=\frac{\partial v^{\mu}}{\partial x^{\alpha}} \boldsymbol{e}_{\mu}+v^{\mu} \frac{\partial \boldsymbol{e}_{\mu}}{\partial x^{\alpha}} \tag{7.5} \end{equation*}(7.5)vxα=vμxαeμ+vμeμxα
The tricky second term on the right is due to the change in basis vectors with position. The derivative of the basis vector e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ can have components along any of the basis vectors, so to express this we define connection coefficients, also known as Christoffel symbols, 4 4 ^(4){ }^{4}4 denoted Γ μ α β Γ μ α β Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}Γμαβ, and
Fig. 7.3 Parallel transport of a vector on a spherical surface.
4 4 ^(4){ }^{4}4 Elwin Bruno Christoffel (1829-1900). The mathematics described here were invented by Christoffel in 1869 and further explored by Gregorio RicciCurbastro (1853-1925) who, in the years leading to 1900 , developed the mathematical machinery employed by Einstein in developing general relativity. We shall follow the modern convention of calling the symbols 'connection coefficients' in this book.
then write the change in basis vectors as
(7.6) e μ x α = Γ λ α μ e λ . (7.6) e μ x α = Γ λ α μ e λ . {:(7.6)(dele_(mu))/(delx^(alpha))=Gamma^(lambda)_(alpha mu)e_(lambda).:}\begin{equation*} \frac{\partial \boldsymbol{e}_{\mu}}{\partial x^{\alpha}}=\Gamma^{\lambda}{ }_{\alpha \mu} \boldsymbol{e}_{\lambda} . \tag{7.6} \end{equation*}(7.6)eμxα=Γλαμeλ.
The connection coefficients encode all of the information describing how the coordinates change as we move around spacetime. Another way of thinking about this is that, since there are different local coordinate systems at different points in space, the connection coefficients tell us how the coordinate systems are connected, that is, how to translate between coordinate system as we move around. 5 5 ^(5){ }^{5}5

Example 7.4

In the following two chapters, we shall find a simple and efficient means of extracting connection coefficients. Before we get to that, here is a simple, 'brute force and ignorance' example, based on eqn 7.6. We saw in Chapter 3 that for a plane-polar coordinate system, the basis vectors have derivatives 6 6 ^(6){ }^{6}6
(7.7) e r 2 = 0 , e r θ = e θ r e θ r = e θ r , e θ θ = r e r . (7.7) e r 2 = 0 , e r θ = e θ r e θ r = e θ r , e θ θ = r e r . {:(7.7){:[(dele_(r))/(del^(2))=0",",(dele_(r))/(deldel_(theta))=(e_(theta))/(r)],[(dele_(theta))/(del r)=(e_(theta))/(r)",",(dele_(theta))/(del theta)=-re_(r).]:}:}\begin{array}{cc} \frac{\partial \boldsymbol{e}_{r}}{\partial^{2}}=0, & \frac{\partial \boldsymbol{e}_{r}}{\partial \partial_{\theta}}=\frac{\boldsymbol{e}_{\theta}}{r} \tag{7.7}\\ \frac{\partial \boldsymbol{e}_{\theta}}{\partial r}=\frac{\boldsymbol{e}_{\theta}}{r}, & \frac{\partial \boldsymbol{e}_{\theta}}{\partial \theta}=-r \boldsymbol{e}_{r} . \end{array}(7.7)er2=0,erθ=eθreθr=eθr,eθθ=rer.
This allows us to write down the connection coefficients. Using eqn 7.6 we can read off
(7.8) Γ r r r = 0 , Γ θ r r = 0 , Γ r θ r = 0 , Γ r θ r = 0 , Γ θ r θ θ = 1 r , Γ r θ θ = r , Γ θ θ = 1 r θ (7.8) Γ r r r = 0 , Γ θ r r = 0 , Γ r θ r = 0 , Γ r θ r = 0 , Γ θ r θ θ = 1 r , Γ r θ θ = r , Γ θ θ = 1 r θ {:(7.8){:[Gamma^(r)_(rr)=0",",Gamma^(theta)_(rr)=0",",Gamma^(r)_(theta r)=0","],[Gamma_(r theta)^(r)=0",",Gamma^(theta)_(r theta)^(theta)=(1)/(r)",",Gamma^(r)_(theta theta)=-r","],[Gamma_(theta theta)=(1)/(r)],[theta]:}:}\begin{array}{lll} \Gamma^{r}{ }_{r r}=0, & \Gamma^{\theta}{ }_{r r}=0, & \Gamma^{r}{ }_{\theta r}=0, \tag{7.8}\\ \Gamma_{r \theta}^{r}=0, & \Gamma^{\theta}{ }_{r \theta}^{\theta}=\frac{1}{r}, & \Gamma^{r}{ }_{\theta \theta}=-r, \\ \Gamma_{\theta \theta}=\frac{1}{r} \\ \theta \end{array}(7.8)Γrrr=0,Γθrr=0,Γrθr=0,Γrθr=0,Γθrθθ=1r,Γrθθ=r,Γθθ=1rθ
Although Euclidean space described by the plane polar coordinates is flat, we still have non-zero connection coefficients. The presence of connection coefficients therefore does not alone tell us whether a space is curved. 7 7 ^(7){ }^{7}7
In all of the coordinate frames that we examine in this book, the connection coefficients have the property
(7.9) Γ α β μ = Γ β α μ . (7.9) Γ α β μ = Γ β α μ . {:(7.9)Gamma_(alpha beta)^(mu)=Gamma_(beta alpha)^(mu).:}\begin{equation*} \Gamma_{\alpha \beta}^{\mu}=\Gamma_{\beta \alpha}^{\mu} . \tag{7.9} \end{equation*}(7.9)Γαβμ=Γβαμ.
A connection with this property is often called symmetric or torsion free.
Finally, we ask why we call the Гs connection coefficients or Christoffel symbols? The answer is because, unlike most of the objects we deal with in relativity, they are not the components of a tensor. 8 8 ^(8){ }^{8}8
Example 7.5
Usually, we expect a tensor's components to transform as
(7.10) T γ β α = x α x α x β x β x γ x γ T γ β α . (7.10) T γ β α = x α x α x β x β x γ x γ T γ β α . {:(7.10)T_(gamma^(')beta^('))^(alpha^('))=(delx^(alpha^(')))/(delx^(alpha))*(delx^(beta))/(delx^(beta^(')))(delx^(gamma))/(delx^(gamma))*T_(gamma beta)^(alpha).:}\begin{equation*} T_{\gamma^{\prime} \beta^{\prime}}^{\alpha^{\prime}}=\frac{\partial x^{\alpha^{\prime}}}{\partial x^{\alpha}} \cdot \frac{\partial x^{\beta}}{\partial x^{\beta^{\prime}}} \frac{\partial x^{\gamma}}{\partial x^{\gamma}} \cdot T_{\gamma \beta}^{\alpha} . \tag{7.10} \end{equation*}(7.10)Tγβα=xαxαxβxβxγxγTγβα.
We'll see in Part V that the components of the connection actually transform as
(7.11) Γ γ β α = x α x α ( 2 x α x β x γ + x β x β x γ x γ Γ γ β α ) (7.11) Γ γ β α = x α x α 2 x α x β x γ + x β x β x γ x γ Γ γ β α {:(7.11)Gamma_(gamma^(')beta^('))^(alpha^('))=(delx^(alpha^(')))/(delx^(alpha))((del^(2)x^(alpha))/(delx^(beta^('))delx^(gamma^(')))+(delx^(beta))/(delx^(beta^(')))(delx^(gamma))/(delx^(gamma^(')))Gamma_(gamma beta)^(alpha)):}\begin{equation*} \Gamma_{\gamma^{\prime} \beta^{\prime}}^{\alpha^{\prime}}=\frac{\partial x^{\alpha^{\prime}}}{\partial x^{\alpha}}\left(\frac{\partial^{2} x^{\alpha}}{\partial x^{\beta^{\prime}} \partial x^{\gamma^{\prime}}}+\frac{\partial x^{\beta}}{\partial x^{\beta^{\prime}}} \frac{\partial x^{\gamma}}{\partial x^{\gamma^{\prime}}} \Gamma_{\gamma \beta}^{\alpha}\right) \tag{7.11} \end{equation*}(7.11)Γγβα=xαxα(2xαxβxγ+xβxβxγxγΓγβα)
This shows us that the first term in the braces spoils the tensor transformation law. Therefore, the Γ Γ Gamma\GammaΓ s are not tensors in general.

7.3 The covariant derivative

We now have the tools to finally take a covariant derivative. Substituting eqn 7.6 into eqn 7.5 we have a derivative
(7.12) v x α = v μ x α e μ + v μ Γ α μ λ e λ , (7.12) v x α = v μ x α e μ + v μ Γ α μ λ e λ , {:(7.12)(del v)/(delx^(alpha))=(delv^(mu))/(delx^(alpha))e_(mu)+v^(mu)Gamma_(alpha mu)^(lambda)e_(lambda)",":}\begin{equation*} \frac{\partial \boldsymbol{v}}{\partial x^{\alpha}}=\frac{\partial v^{\mu}}{\partial x^{\alpha}} \boldsymbol{e}_{\mu}+v^{\mu} \Gamma_{\alpha \mu}^{\lambda} \boldsymbol{e}_{\lambda}, \tag{7.12} \end{equation*}(7.12)vxα=vμxαeμ+vμΓαμλeλ,
and by relabelling the dummy indices
(7.13) v x α = ( v μ x α + v λ Γ α λ μ ) e μ (7.13) v x α = v μ x α + v λ Γ α λ μ e μ {:(7.13)(del v)/(delx^(alpha))=((delv^(mu))/(delx^(alpha))+v^(lambda)Gamma_(alpha lambda)^(mu))e_(mu):}\begin{equation*} \frac{\partial \boldsymbol{v}}{\partial x^{\alpha}}=\left(\frac{\partial v^{\mu}}{\partial x^{\alpha}}+v^{\lambda} \Gamma_{\alpha \lambda}^{\mu}\right) \boldsymbol{e}_{\mu} \tag{7.13} \end{equation*}(7.13)vxα=(vμxα+vλΓαλμ)eμ
From here on we shall write 9 9 ^(9){ }^{9}9 this quantity as α v α v grad_(alpha)v\nabla_{\alpha} \boldsymbol{v}αv, which we call the covariant derivative of v v v\boldsymbol{v}v along the direction e α e α e_(alpha)\boldsymbol{e}_{\alpha}eα. This new notation allows us to replace the left-hand side of eqn 7.13 with α v α v grad_(alpha)v\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}αv, rather than as v / x α v / x α del v//delx^(alpha)\partial \boldsymbol{v} / \partial x^{\alpha}v/xα which turns out to be a much more convenient. 10 10 ^(10){ }^{10}10 While we are in the process of introducing notation, a commonly used laboursaving shorthand for writing derivatives using commas and semicolons is given in the shaded box in the margin. Putting everything together, we have
(7.18) α v = ( v μ x α + v λ Γ α λ μ ) e μ (7.18) α v = v μ x α + v λ Γ α λ μ e μ {:(7.18)grad_(alpha)v=((delv^(mu))/(delx^(alpha))+v^(lambda)Gamma_(alpha lambda)^(mu))e_(mu):}\begin{equation*} \boldsymbol{\nabla}_{\alpha} \boldsymbol{v}=\left(\frac{\partial v^{\mu}}{\partial x^{\alpha}}+v^{\lambda} \Gamma_{\alpha \lambda}^{\mu}\right) e_{\mu} \tag{7.18} \end{equation*}(7.18)αv=(vμxα+vλΓαλμ)eμ
In terms of our classification of tensors, notice that α v α v grad_(alpha)v\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}αv is a ( 1,0 ) object, just like a vector.
Example 7.6
We can immediately note that in Minkowski spacetime the Cartesian basis vectors do not change with position and so all of the Γ Γ Gamma\GammaΓ coefficients are zero, giving the result that
(7.19) α v = v μ x α e μ (flat spacetime). (7.19) α v = v μ x α e μ  (flat spacetime).  {:(7.19)grad_(alpha)v=(delv^(mu))/(delx^(alpha))e_(mu)quad" (flat spacetime). ":}\begin{equation*} \boldsymbol{\nabla}_{\alpha} \boldsymbol{v}=\frac{\partial v^{\mu}}{\partial x^{\alpha}} \boldsymbol{e}_{\mu} \quad \text { (flat spacetime). } \tag{7.19} \end{equation*}(7.19)αv=vμxαeμ (flat spacetime). 
We have worked out the covariant derivative along the direction of the basis vector e α e α e_(alpha)\boldsymbol{e}_{\alpha}eα. What about the covariant derivative along an arbitrary vector? For this purpose we define the connection operator grad\nabla by writing e α = α e α = α e_(alpha)*grad=grad_(alpha)\boldsymbol{e}_{\alpha} \cdot \boldsymbol{\nabla}=\boldsymbol{\nabla}_{\alpha}eα=α. We then define the action of our covariant derivative α α grad_(alpha)\nabla_{\alpha}α on a scalar function simply as the derivative with respect to x α x α x^(alpha)x^{\alpha}xα, or
(7.20) e α f = α f = f x α . (7.20) e α f = α f = f x α . {:(7.20)e_(alpha)*grad f=grad_(alpha)f=(del f)/(delx^(alpha)).:}\begin{equation*} \boldsymbol{e}_{\alpha} \cdot \nabla f=\nabla_{\alpha} f=\frac{\partial f}{\partial x^{\alpha}} . \tag{7.20} \end{equation*}(7.20)eαf=αf=fxα.
Generalizing to different directions by replacing e α e α e_(alpha)\boldsymbol{e}_{\alpha}eα with an arbitrary vector u u u\boldsymbol{u}u, we have the directional derivative
(7.21) u f = u f = u α e α f = u α f x α (7.21) u f = u f = u α e α f = u α f x α {:(7.21)grad_(u)f=u*grad f=u^(alpha)e_(alpha)*grad f=u^(alpha)(del f)/(delx^(alpha)):}\begin{equation*} \nabla_{u} f=\boldsymbol{u} \cdot \boldsymbol{\nabla} f=u^{\alpha} \boldsymbol{e}_{\alpha} \cdot \boldsymbol{\nabla} f=u^{\alpha} \frac{\partial f}{\partial x^{\alpha}} \tag{7.21} \end{equation*}(7.21)uf=uf=uαeαf=uαfxα
If we now interpret the action of the connection operator on vectors in the same way, u v u v grad_(u)v\nabla_{u} \boldsymbol{v}uv is the covariant derivative of the vector v v v\boldsymbol{v}v along the direction u u u\boldsymbol{u}u. We write this as
(7.22) u v = u v = u α e α v = u α α v (7.22) u v = u v = u α e α v = u α α v {:(7.22)grad_(u)v=u*grad v=u^(alpha)e_(alpha)*grad v=u^(alpha)grad_(alpha)v:}\begin{equation*} \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=\boldsymbol{u} \cdot \boldsymbol{\nabla} \boldsymbol{v}=u^{\alpha} \boldsymbol{e}_{\alpha} \cdot \boldsymbol{\nabla} \boldsymbol{v}=u^{\alpha} \boldsymbol{\nabla}_{\alpha} \boldsymbol{v} \tag{7.22} \end{equation*}(7.22)uv=uv=uαeαv=uααv
9 9 ^(9){ }^{9}9 We take α α grad_(alpha)\nabla_{\alpha}α to be short for the more cumbersome
(7.14) α e α . (7.14) α e α . {:(7.14)grad_(alpha)-=grad_(e_(alpha)).:}\begin{equation*} \nabla_{\alpha} \equiv \nabla_{e_{\alpha}} . \tag{7.14} \end{equation*}(7.14)αeα.
In words, this is the directional derivative along the direction e α e α e_(alpha)\boldsymbol{e}_{\alpha}eα.
10 10 ^(10){ }^{10}10 Relating the old notation to the new notation, we have
(7.15) v x α ( α v ) μ e μ (7.15) v x α α v μ e μ {:(7.15)(del v)/(delx^(alpha))-=(grad_(alpha)v)^(mu)e_(mu):}\begin{equation*} \frac{\partial \boldsymbol{v}}{\partial x^{\alpha}} \equiv\left(\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}\right)^{\mu} \boldsymbol{e}_{\mu} \tag{7.15} \end{equation*}(7.15)vxα(αv)μeμ
Important here is that the components of the derivative are not necessarily equal to the derivatives of the components. This is due to the non-zero connection coefficients. In fact, we can rewrite our definition of the derivative of components and basis vectors in the new notation
(7.16) α ( v μ ) = v μ x α (7.16) α v μ = v μ x α {:(7.16)grad_(alpha)(v^(mu))=(delv^(mu))/(delx^(alpha)):}\begin{equation*} \nabla_{\alpha}\left(v^{\mu}\right)=\frac{\partial v^{\mu}}{\partial x^{\alpha}} \tag{7.16} \end{equation*}(7.16)α(vμ)=vμxα
and
(7.17) α ( e μ ) = Γ α μ λ e λ (7.17) α e μ = Γ α μ λ e λ {:(7.17)grad_(alpha)(e_(mu))=Gamma_(alpha mu)^(lambda)e_(lambda):}\begin{equation*} \boldsymbol{\nabla}_{\alpha}\left(\boldsymbol{e}_{\mu}\right)=\Gamma_{\alpha \mu}^{\lambda} \boldsymbol{e}_{\lambda} \tag{7.17} \end{equation*}(7.17)α(eμ)=Γαμλeλ

Commas and semicolons:

Ordinary derivatives of functions can be written in comma notation as
f x α f , α . f x α f , α . (del f)/(delx^(alpha))-=f,alpha.\frac{\partial f}{\partial x^{\alpha}} \equiv f, \alpha .fxαf,α.
The covariant derivative is written in semicolon notation so that the μ μ mu\muμ component of eqn 7.18 becomes
( α v ) μ v μ ; α = v μ , α + v λ Γ μ α λ α v μ v μ ; α = v μ , α + v λ Γ μ α λ (grad_(alpha)v)^(mu)-=v^(mu)_(;alpha)=v^(mu)_(,alpha)+v^(lambda)Gamma^(mu)_(alpha lambda)\left(\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}\right)^{\mu} \equiv v^{\mu}{ }_{; \alpha}=v^{\mu}{ }_{, \alpha}+v^{\lambda} \Gamma^{\mu}{ }_{\alpha \lambda}(αv)μvμ;α=vμ,α+vλΓμαλ,
and so v μ ; α v μ ; α v^(mu)_(;alpha)v^{\mu}{ }_{; \alpha}vμ;α are the components of the covariant derivative. This notation has the virtue of allowing some equations to be written more compactly, though this comes at the expense of leaving expressions littered with punctuation marks.
11 11 ^(11){ }^{11}11 In semicolon notation
u v = u α v ; α μ e μ , u v = u α v ; α μ e μ , grad_(u)v=u^(alpha)v_(;alpha)^(mu)e_(mu),\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=u^{\alpha} v_{; \alpha}^{\mu} \boldsymbol{e}_{\mu},uv=uαv;αμeμ,
where as before we have
v ; α μ = v μ , α + v λ Γ α λ μ . v ; α μ = v μ , α + v λ Γ α λ μ . v_(;alpha)^(mu)=v^(mu)_(,alpha)+v^(lambda)Gamma_(alpha lambda)^(mu).v_{; \alpha}^{\mu}=v^{\mu}{ }_{, \alpha}+v^{\lambda} \Gamma_{\alpha \lambda}^{\mu} .v;αμ=vμ,α+vλΓαλμ.
and conclude that the covariant derivative along a vector u u u\boldsymbol{u}u is given in terms of components by 11 11 ^(11){ }^{11}11
(7.23) u v = u α ( v μ x α + v λ Γ α λ μ ) e μ (7.23) u v = u α v μ x α + v λ Γ α λ μ e μ {:(7.23)grad_(u)v=u^(alpha)((delv^(mu))/(delx^(alpha))+v^(lambda)Gamma_(alpha lambda)^(mu))e_(mu):}\begin{equation*} \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=u^{\alpha}\left(\frac{\partial v^{\mu}}{\partial x^{\alpha}}+v^{\lambda} \Gamma_{\alpha \lambda}^{\mu}\right) \boldsymbol{e}_{\mu} \tag{7.23} \end{equation*}(7.23)uv=uα(vμxα+vλΓαλμ)eμ
The covariant derivative selects out the change in a vector with position owing to its genuine change, eliminating the contribution due to the changing of coordinates with position. If a vector is parallel transported along a path then the only change should be due to the coordinates changing. So parallel transport of a vector v v v\boldsymbol{v}v along the direction u u u\boldsymbol{u}u implies
(7.24) u v = 0 (parallel transport ) (7.24) u v = 0  (parallel transport  {:(7.24){:grad_(u)v=0quad" (parallel transport "):}\begin{equation*} \left.\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=0 \quad \text { (parallel transport }\right) \tag{7.24} \end{equation*}(7.24)uv=0 (parallel transport )
Example 7.7
This latter expression means that the components obey
(7.25) u α v μ x α + Γ α β μ v β u α = 0 (7.26) v μ x α = Γ α β μ α β v β (7.25) u α v μ x α + Γ α β μ v β u α = 0 (7.26) v μ x α = Γ α β μ α β v β {:[(7.25)u^(alpha)(delv^(mu))/(delx^(alpha))+Gamma_(alpha beta)^(mu)v^(beta)u^(alpha)=0],[(7.26)(delv^(mu))/(delx^(alpha))=-Gamma_(alpha beta)^(mu)_(alpha beta)v^(beta)]:}\begin{gather*} u^{\alpha} \frac{\partial v^{\mu}}{\partial x^{\alpha}}+\Gamma_{\alpha \beta}^{\mu} v^{\beta} u^{\alpha}=0 \tag{7.25}\\ \frac{\partial v^{\mu}}{\partial x^{\alpha}}=-\Gamma_{\alpha \beta}^{\mu}{ }_{\alpha \beta} v^{\beta} \tag{7.26} \end{gather*}(7.25)uαvμxα+Γαβμvβuα=0(7.26)vμxα=Γαβμαβvβ
In words, the change in the components v μ / x ν v μ / x ν delv^(mu)//delx^(nu)\partial v^{\mu} / \partial x^{\nu}vμ/xν is, in this case, entirely due 12 12 ^(12){ }^{12}12 to the change in coordinates Γ μ α β v β Γ μ α β v β -Gamma^(mu)_(alpha beta)v^(beta)-\Gamma^{\mu}{ }_{\alpha \beta} v^{\beta}Γμαβvβ. That is to say that when a vector is parallel transported we have
(7.27) ( Change in a vector's components ) = ( Change due to coordinate system ) (7.27) (  Change in a   vector's components  ) = (  Change due to   coordinate system  ) {:(7.27)((" Change in a ")/(" vector's components "))=((" Change due to ")/(" coordinate system ")):}\begin{equation*} \binom{\text { Change in a }}{\text { vector's components }}=\binom{\text { Change due to }}{\text { coordinate system }} \tag{7.27} \end{equation*}(7.27)( Change in a  vector's components )=( Change due to  coordinate system )

7.4 Parametrized paths

We now have the covariant derivative at our disposal in the form of a directional derivative of some vector v v v\boldsymbol{v}v taken along a vector u u u\boldsymbol{u}u. We shall also need the derivative in a form more suitable to apply to curves in spacetime such as the world lines of particles.
We saw in Chapter 1 that the most general way to describe a curve is to parametrize it by introducing a quantity that varies monotonically along its length. That is, a curve stretching from point λ = a λ = a lambda=a\lambda=aλ=a to λ = b λ = b lambda=b\lambda=bλ=b is written as x ( λ ) x ( λ ) x(lambda)x(\lambda)x(λ), where λ λ lambda\lambdaλ parametrizes the curve. It marks off regular intervals, so we know how far along the curve we are, as shown in Fig. 7.4.

Example 7.8

For the two-dimensional space R 2 R 2 R^(2)\mathbb{R}^{2}R2 a curve is given by ( x ( λ ) , y ( λ ) ( x ( λ ) , y ( λ ) (x(lambda),y(lambda)(x(\lambda), y(\lambda)(x(λ),y(λ) ), i.e. with x x xxx and y y yyy both functions of λ λ lambda\lambdaλ.
  • A straight line y = m x + c y = m x + c y=mx+cy=m x+cy=mx+c can be parametrized with x ( λ ) = λ x ( λ ) = λ x(lambda)=lambdax(\lambda)=\lambdax(λ)=λ and y ( λ ) = m λ + c y ( λ ) = m λ + c y(lambda)=m lambda+cy(\lambda)=m \lambda+cy(λ)=mλ+c.
  • A parabola y = x 2 y = x 2 y=x^(2)y=x^{2}y=x2 can be parametrized with x ( λ ) = λ x ( λ ) = λ x(lambda)=lambdax(\lambda)=\lambdax(λ)=λ and y = λ 2 y = λ 2 y=lambda^(2)y=\lambda^{2}y=λ2.
  • A circle x 2 + y 2 = a 2 x 2 + y 2 = a 2 x^(2)+y^(2)=a^(2)x^{2}+y^{2}=a^{2}x2+y2=a2 can be parametrized with x ( λ ) = a cos λ x ( λ ) = a cos λ x(lambda)=a cos lambdax(\lambda)=a \cos \lambdax(λ)=acosλ and y ( λ ) = a sin λ y ( λ ) = a sin λ y(lambda)=a sin lambday(\lambda)=a \sin \lambday(λ)=asinλ.
The precise choice of parametrization isn't crucial. If an allowable parametrization is given by regular intervals of λ λ lambda\lambdaλ, we could equally well choose a different parametrization η η eta\etaη such that λ = α η + β λ = α η + β lambda=alpha eta+beta\lambda=\alpha \eta+\betaλ=αη+β, where α α alpha\alphaα and β β beta\betaβ are constants. Such a parametrization is called an affine parametrization.
With this in mind, we can return to the covariant derivative itself. In many cases, we are interested in the rate of change of a vector field, v ( x ) v ( x ) v(x)\boldsymbol{v}(x)v(x), an object where we input a position x = P x = P x=Px=\mathcal{P}x=P and output a vector v v v\boldsymbol{v}v appropriate for that point P P P\mathcal{P}P. We then ask how rapidly the vector field v ( x ) v ( x ) v(x)\boldsymbol{v}(x)v(x) changes along a curve x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ). This involves parametrizing the curve, and then we seek
(7.28) ( Rate of change of v with respect to λ ) D v d λ (7.28) (  Rate of change of  v  with respect to  λ ) D v d λ {:(7.28)((" Rate of change of ")/(v" with respect to "lambda))-=(Dv)/((d)lambda):}\begin{equation*} \binom{\text { Rate of change of }}{v \text { with respect to } \lambda} \equiv \frac{\mathrm{D} v}{\mathrm{~d} \lambda} \tag{7.28} \end{equation*}(7.28)( Rate of change of v with respect to λ)Dv dλ
Here we've introduced some new notation: the covariant derivative with respect to an affine parameter is denoted D / d λ . 13 D / d λ . 13 D//dlambda.^(13)\mathrm{D} / \mathrm{d} \lambda .{ }^{13}D/dλ.13
In order to use the covariant derivative as we've defined it so far, we seek a vector telling us the direction along which to take the derivative. This is provided by the tangent vector to the curve x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ), given by 14 14 ^(14){ }^{14}14
(7.29) u = ( d x μ ( λ ) d λ ) e μ (7.29) u = d x μ ( λ ) d λ e μ {:(7.29)u=((dx^(mu)(lambda))/(dlambda))e_(mu):}\begin{equation*} \boldsymbol{u}=\left(\frac{\mathrm{d} x^{\mu}(\lambda)}{\mathrm{d} \lambda}\right) \boldsymbol{e}_{\mu} \tag{7.29} \end{equation*}(7.29)u=(dxμ(λ)dλ)eμ
That is, at every point λ λ lambda\lambdaλ along the curve we have a tangent vector u u u\boldsymbol{u}u (Fig. 7.5). This tangent vector is one of the most useful tools in this book. We then have
( Rate of change of v with respect to λ ) D v d λ u v ( Covariant derivative of v along u ) (  Rate of change of  v  with respect to  λ ) D v d λ u v (  Covariant derivative of  v  along  u ) ((" Rate of change of ")/(v" with respect to "lambda))-=(Dv)/(dlambda)-=grad_(u)v-=((" Covariant derivative of ")/(v" along "u))\binom{\text { Rate of change of }}{\boldsymbol{v} \text { with respect to } \lambda} \equiv \frac{\mathrm{D} \boldsymbol{v}}{\mathrm{d} \lambda} \equiv \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v} \equiv\binom{\text { Covariant derivative of }}{\boldsymbol{v} \text { along } \boldsymbol{u}}( Rate of change of v with respect to λ)Dvdλuv( Covariant derivative of v along u), where u u u\boldsymbol{u}u us the tangent vector field to the curve. For massive particles, which follow timelike curves, we shall usually choose λ λ lambda\lambdaλ to be the proper time τ τ tau\tauτ, which allows us to interpret the tangent as the particle's velocity, and provides the useful constraint u u = 1 u u = 1 u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1uu=1.
Example 7.9
This way of thinking about the covariant derivative makes it similar to an ordinary derivative defined in terms of evaluating a function at two points, f ( x ) f ( x ) f(x)f(x)f(x) and f ( x + δ x ) f ( x + δ x ) f(x+delta x)f(x+\delta x)f(x+δx), and taking the difference in the limit of small δ x δ x delta x\delta xδx. However, key to the definition here is the notion of parallelism, which allows us to remove the change caused by the changing coordinate system. In order to do this, the instructions of how to take the covariant derivative, in these terms, are as follows:
(i) Take the vector v v v\boldsymbol{v}v at λ = λ 0 + ε λ = λ 0 + ε lambda=lambda_(0)+epsi\lambda=\lambda_{0}+\varepsilonλ=λ0+ε.
(i) Pake the vector v v v\boldsymbol{v}v at λ = λ 0 + ε λ = λ 0 + ε lambda=lambda_(0)+epsi\lambda=\lambda_{0}+\varepsilonλ=λ0+ε.
(ii) Parallel transport it back to λ 0 λ 0 lambda_(0)\lambda_{0}λ0.
(ii) Parallel transport it back to λ 0 λ 0 lambda_(0)\lambda_{0}λ0.
(iii) Evaluate δ v δ v delta v\delta \boldsymbol{v}δv, which measured how different it is from v v v\boldsymbol{v}v at λ 0 λ 0 lambda_(0)\lambda_{0}λ0.
(iii) Evaluate δ v δ v delta v\delta \boldsymbol{v}δv, which measured how
(iv) Divide by ε ε epsi\varepsilonε and take the limit.
In equations, we have
(7.31) D v d λ = u v = lim ε 0 ( v ( λ 0 + ε ) ( parallel transport to λ 0 ) v ( λ 0 ) ε ) (7.31) D v d λ = u v = lim ε 0 v λ 0 + ε parallel transport to  λ 0 v λ 0 ε {:(7.31)(Dv)/((d)lambda)=grad_(u)v=lim_(epsi rarr0)((v(lambda_(0)+epsi)_(("parallel transport to "lambda_(0)))-v(lambda_(0)))/(epsi)):}\begin{equation*} \frac{\mathrm{D} \boldsymbol{v}}{\mathrm{~d} \lambda}=\nabla_{u} \boldsymbol{v}=\lim _{\varepsilon \rightarrow 0}\left(\frac{\boldsymbol{v}\left(\lambda_{0}+\varepsilon\right)_{\left(\text {parallel transport to } \lambda_{0}\right)}-\boldsymbol{v}\left(\lambda_{0}\right)}{\varepsilon}\right) \tag{7.31} \end{equation*}(7.31)Dv dλ=uv=limε0(v(λ0+ε)(parallel transport to λ0)v(λ0)ε)
13 13 ^(13){ }^{13}13 The notation reminds us that owing to the changes in the coordinate systems, the components of the covariant derivative ( D v / d λ ) μ ( D v / d λ ) μ (Dv//dlambda)^(mu)(\mathrm{D} \boldsymbol{v} / \mathrm{d} \lambda)^{\mu}(Dv/dλ)μ, will not generally be equivalent to the derivatives of components d v μ / d λ d v μ / d λ dv^(mu)//dlambda\mathrm{d} v^{\mu} / \mathrm{d} \lambdadvμ/dλ. However, for a scalar field f f fff we do have d f / d λ = D f / d λ d f / d λ = D f / d λ df//dlambda=Df//dlambda\mathrm{d} f / \mathrm{d} \lambda=\mathrm{D} f / \mathrm{d} \lambdadf/dλ=Df/dλ.
Fig. 7.5 The tangent vectors u = ( d x μ ( λ ) / d λ ) e μ u = d x μ ( λ ) / d λ e μ u=(dx^(mu)(lambda)//dlambda)e_(mu)\boldsymbol{u}=\left(\mathrm{d} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) \boldsymbol{e}_{\mu}u=(dxμ(λ)/dλ)eμ along the curve parametrized by λ λ lambda\lambdaλ. For λ = τ λ = τ lambda=tau\lambda=\tauλ=τ this provide a velocity vector (which itself varies along the path).
14 14 ^(14){ }^{14}14 We do not write d x / d λ d x / d λ dx//dlambda\mathrm{d} \boldsymbol{x} / \mathrm{d} \lambdadx/dλ as we sometime do in special relativity. As discussed in Chapter 3, the displacement vector x = x μ e μ x = x μ e μ x=x^(mu)e_(mu)\boldsymbol{x}=x^{\mu} \boldsymbol{e}_{\mu}x=xμeμ, thought of as pointing a distance | x | | x | |x||\boldsymbol{x}||x| from the origin to coordinate point x μ x μ x^(mu)x^{\mu}xμ, does not transform appropriately, and so we won't use it in this form (e.g. by taking its derivative). Note also that the tangent vector is given by u = ( D x μ ( λ ) / d λ ) e μ = u = D x μ ( λ ) / d λ e μ = u=(Dx^(mu)(lambda)//dlambda)e_(mu)=\boldsymbol{u}=\left(\mathrm{D} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) \boldsymbol{e}_{\mu}=u=(Dxμ(λ)/dλ)eμ= ( d x μ ( λ ) / d λ ) e μ d x μ ( λ ) / d λ e μ (dx^(mu)(lambda)//dlambda)e_(mu)\left(\mathrm{d} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) e_{\mu}(dxμ(λ)/dλ)eμ, since x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ) is a set of scalar functions.
Fig. 7.6 Taking the covariant derivative using eqn 7.31 .
Example 7.10
We can check our new version of the covariant derivative of a vector A A A\boldsymbol{A}A in the case of flat spacetime, where the connection coefficients expressed in Cartesian coordinates vanish. Writing out all the components, we have
( D A d λ ) α = ( u A ) α = u μ ( A α x μ + Γ α μ ν A ν ) = u μ A α x μ ( the connection Γ α μ ν = 0 ) (7.32) = d x μ d λ A α x μ = d A α d λ , D A d λ α = u A α = u μ A α x μ + Γ α μ ν A ν = u μ A α x μ  the connection  Γ α μ ν = 0 (7.32) = d x μ d λ A α x μ = d A α d λ , {:[((DA)/((d)lambda))^(alpha)=(grad_(u)A)^(alpha)=u^(mu)((delA^(alpha))/(delx^(mu))+Gamma^(alpha)_(mu nu)A^(nu))],[=u^(mu)(delA^(alpha))/(delx^(mu))*(" the connection "Gamma^(alpha)_(mu nu)=0)],[(7.32)=(dx^(mu))/(dlambda)(delA^(alpha))/(delx^(mu))=(dA^(alpha))/(dlambda)","]:}\begin{align*} \left(\frac{\mathrm{D} \boldsymbol{A}}{\mathrm{~d} \lambda}\right)^{\alpha}=\left(\nabla_{u} \boldsymbol{A}\right)^{\alpha} & =u^{\mu}\left(\frac{\partial A^{\alpha}}{\partial x^{\mu}}+\Gamma^{\alpha}{ }_{\mu \nu} A^{\nu}\right) \\ & =u^{\mu} \frac{\partial A^{\alpha}}{\partial x^{\mu}} \cdot\left(\text { the connection } \Gamma^{\alpha}{ }_{\mu \nu}=0\right) \\ & =\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \frac{\partial A^{\alpha}}{\partial x^{\mu}}=\frac{\mathrm{d} A^{\alpha}}{\mathrm{d} \lambda}, \tag{7.32} \end{align*}(DA dλ)α=(uA)α=uμ(Aαxμ+ΓαμνAν)=uμAαxμ( the connection Γαμν=0)(7.32)=dxμdλAαxμ=dAαdλ,
where we've used the chain rule in the final step. We conclude that for flat spacetime the components of the derivative are simply the derivatives of the components with respect to the parameter λ λ lambda\lambdaλ.
The covariant derivative notation D / d λ D / d λ D//dlambda\mathrm{D} / \mathrm{d} \lambdaD/dλ proves very useful, not least because of its resemblance to the ordinary derivative.

7.5 Enter the metric

After formulating a covariant derivative, we might ask if this is the only way we could have constructed it. It turns out that our freedom to formulate it was restricted by the metric field g g g\boldsymbol{g}g, which is the foundation of our physical description of spacetime, and it is exactly this metric field that forces this version of the covariant derivative upon us. This idea is encapsulated in the notion of what is called the compatibility of the connection which requires that the covariant derivative obeys
(7.33) α g = 0 or, in components, g μ ν ; α = 0 , (7.33) α g = 0  or, in components,  g μ ν ; α = 0 , {:(7.33)grad_(alpha)g=0quad" or, in components, "quadg_(mu nu;alpha)=0",":}\begin{equation*} \nabla_{\alpha} \boldsymbol{g}=0 \quad \text { or, in components, } \quad g_{\mu \nu ; \alpha}=0, \tag{7.33} \end{equation*}(7.33)αg=0 or, in components, gμν;α=0,
for all α α alpha\alphaα. This equation inseparably joins the metric and the covariant derivative. 15 15 ^(15){ }^{15}15 The importance of this condition is that, if it did not hold, then the lengths of vectors would change as we parallel transport them. 16 16 ^(16){ }^{16}16 This would be highly undesirable for a description of the physics of the real world.
16 16 ^(16){ }^{16}16 Another consequence of this equation is that it provides the long-awaited explanation of what affine parametrizations actually are. They are those smooth parametrizations of a curve that have the property that the length of a vector doesn't change as we parallel transport the vector along the curve.
17 17 ^(17){ }^{17}17 The Leibniz product rule does indeed hold, as discussed in Part V.
15 15 ^(15){ }^{15}15 We do not yet have an explicit ex pression for how to compute the covariant derivative of a ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) tensor like g g g\boldsymbol{g}g. We delay deriving the explicit expression until Part V. At this stage we stat that it is given in components by
g μ ν ; α = g μ ν , α Γ β α μ g β ν Γ β α ν g μ β g μ ν ; α = g μ ν , α Γ β α μ g β ν Γ β α ν g μ β g_(mu nu;alpha)=g_(mu nu,alpha)-Gamma^(beta)_(alpha mu)g_(beta nu)-Gamma^(beta)_(alpha nu)g_(mu beta)g_{\mu \nu ; \alpha}=g_{\mu \nu, \alpha}-\Gamma^{\beta}{ }_{\alpha \mu} g_{\beta \nu}-\Gamma^{\beta}{ }_{\alpha \nu} g_{\mu \beta}gμν;α=gμν,αΓβαμgβνΓβανgμβ.
It is also helpful at this stage to note that we can also write the derivative for a ( 2 , 0 ) ( 2 , 0 ) (2,0)(2,0)(2,0) tensor like T T T\boldsymbol{T}T in components as
T μ ν ; α = T μ ν , α + Γ μ α β T β ν + Γ ν α β T μ β T μ ν ; α = T μ ν , α + Γ μ α β T β ν + Γ ν α β T μ β T^(mu nu)_(;alpha)=T^(mu nu)_(,alpha)+Gamma^(mu)_(alpha beta)T^(beta nu)+Gamma^(nu)_(alpha beta)T^(mu beta)T^{\mu \nu}{ }_{; \alpha}=T^{\mu \nu}{ }_{, \alpha}+\Gamma^{\mu}{ }_{\alpha \beta} T^{\beta \nu}+\Gamma^{\nu}{ }_{\alpha \beta} T^{\mu \beta}Tμν;α=Tμν,α+ΓμαβTβν+ΓναβTμβ.
qquad\qquad

Example 7.11

We shall prove the compatibility condition. The length of a vector is given by (the square root of) A A = g ( A , A ) A A = g ( A , A ) A*A=g(A,A)\boldsymbol{A} \cdot \boldsymbol{A}=\boldsymbol{g}(\boldsymbol{A}, \boldsymbol{A})AA=g(A,A), or g μ ν A μ A ν g μ ν A μ A ν g_(mu nu)A^(mu)A^(nu)g_{\mu \nu} A^{\mu} A^{\nu}gμνAμAν. Take A A A\boldsymbol{A}A to be a covariant constant (i.e. parallel) such that α A = 0 α A = 0 grad_(alpha)A=0\nabla_{\alpha} \boldsymbol{A}=0αA=0 (or A μ ; α = 0 A μ ; α = 0 A^(mu)_(;alpha)=0A^{\mu}{ }_{; \alpha}=0Aμ;α=0 ). If the length of the vector is constant we expect the covariant derivative of g ( A , A ) g ( A , A ) g(A,A)\boldsymbol{g}(\boldsymbol{A}, \boldsymbol{A})g(A,A) to vanish. Assuming the Leibniz product rule 17 17 ^(17){ }^{17}17 we have
(7.36) g μ ν ; α A μ A ν + g μ ν A ; α μ A ν + g μ ν A μ A ; α ν = 0 . (7.36) g μ ν ; α A μ A ν + g μ ν A ; α μ A ν + g μ ν A μ A ; α ν = 0 . {:(7.36)g_(mu nu;alpha)A^(mu)A^(nu)+g_(mu nu)A_(;alpha)^(mu)A^(nu)+g_(mu nu)A^(mu)A_(;alpha)^(nu)=0.:}\begin{equation*} g_{\mu \nu ; \alpha} A^{\mu} A^{\nu}+g_{\mu \nu} A_{; \alpha}^{\mu} A^{\nu}+g_{\mu \nu} A^{\mu} A_{; \alpha}^{\nu}=0 . \tag{7.36} \end{equation*}(7.36)gμν;αAμAν+gμνA;αμAν+gμνAμA;αν=0.
Since A μ ; α = 0 A μ ; α = 0 A^(mu)_(;alpha)=0A^{\mu}{ }_{; \alpha}=0Aμ;α=0, then we must have g μ ν ; α = 0 g μ ν ; α = 0 g_(mu nu;alpha)=0g_{\mu \nu ; \alpha}=0gμν;α=0, as claimed.
Finally, we note that the compatibility condition is the basis of other links between the metric and the covariant derivative. The connection coefficients Γ μ α β Γ μ α β Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}Γμαβ may be derived directly from the components of the metric. 18 18 ^(18){ }^{18}18 We saw how the connection coefficients arose due to the change in the basis vectors with position in spacetime and could be calculated via derivatives like e μ / x α = Γ λ μ α e λ e μ / x α = Γ λ μ α e λ dele_(mu)//delx^(alpha)=Gamma^(lambda)_(mu alpha)e_(lambda)\partial \boldsymbol{e}_{\mu} / \partial x^{\alpha}=\Gamma^{\lambda}{ }_{\mu \alpha} \boldsymbol{e}_{\lambda}eμ/xα=Γλμαeλ. Recall also that the components of the metric reflect the basis vectors via g μ ν = g ( e μ , e ν ) g μ ν = g e μ , e ν g_(mu nu)=g(e_(mu),e_(nu))g_{\mu \nu}=\boldsymbol{g}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right)gμν=g(eμ,eν) or, more simply g μ ν = e μ e ν g μ ν = e μ e ν g_(mu nu)=e_(mu)*e_(nu)g_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}gμν=eμeν. It therefore comes as little surprise that the connection coefficients are formed from a combination of first derivatives of the metric components as enshrined in the conceptual expression
(7.38) ( g ) ( Γ ¯ ) . (7.38) ( g ¯ ) ( Γ ¯ ) . {:(7.38)( bar(del g))rarr( bar(Gamma)).:}\begin{equation*} (\overline{\partial g}) \rightarrow(\bar{\Gamma}) . \tag{7.38} \end{equation*}(7.38)(g)(Γ¯).
We shall expand on this point in the coming chapters.
In the next two chapters, we turn to the use of the covariant derivative in understanding how a particle moves under the influence of gravitation.

Chapter summary

  • Parallel transport provides a method of comparing vectors at different points in curved space. A vector that is moved such that it has the same components in different local coordinate systems has been parallel transported.
  • The covariant derivative can be used to measure how vector fields change with position in spacetime. It is a directional derivative given by
(7.39) u v = u α ( v μ x α + v λ Γ μ α λ ) e μ . (7.39) u v = u α v μ x α + v λ Γ μ α λ e μ . {:(7.39)grad_(u)v=u^(alpha)((delv^(mu))/(delx^(alpha))+v^(lambda)Gamma^(mu)_(alpha lambda))e_(mu).:}\begin{equation*} \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=u^{\alpha}\left(\frac{\partial v^{\mu}}{\partial x^{\alpha}}+v^{\lambda} \Gamma^{\mu}{ }_{\alpha \lambda}\right) \boldsymbol{e}_{\mu} . \tag{7.39} \end{equation*}(7.39)uv=uα(vμxα+vλΓμαλ)eμ.
  • The tangent vector to the world line of a massive particle parametrized by the proper time is the timelike velocity vector u u u\boldsymbol{u}u, with the property u u = 1 u u = 1 u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1uu=1.
    18 18 ^(18){ }^{18}18 We shall see in Chapter 9 that the expression we will need is
    g ρ λ Γ ρ μ σ = g ρ λ Γ ρ μ σ = g_(rho lambda)Gamma^(rho)_(mu sigma)=g_{\rho \lambda} \Gamma^{\rho}{ }_{\mu \sigma}=gρλΓρμσ=
    1 2 ( g λ μ x σ + g λ σ x μ g μ σ x λ ) 1 2 g λ μ x σ + g λ σ x μ g μ σ x λ (1)/(2)((delg_(lambda mu))/(delx^(sigma))+(delg_(lambda sigma))/(delx^(mu))-(delg_(mu sigma))/(delx^(lambda)))\frac{1}{2}\left(\frac{\partial g_{\lambda \mu}}{\partial x^{\sigma}}+\frac{\partial g_{\lambda \sigma}}{\partial x^{\mu}}-\frac{\partial g_{\mu \sigma}}{\partial x^{\lambda}}\right)12(gλμxσ+gλσxμgμσxλ). (7.37)
    and verify u D u / d s u D u / d s u*Du//ds\boldsymbol{u} \cdot \mathrm{D} \boldsymbol{u} / \mathrm{d} suDu/ds vanishes.
    Now consider the reparametrization t = sin s t = sin s t=sin st=\sin st=sins. This is not in the form t = a s + b t = a s + b t=as+bt=a s+bt=as+b, so is not an affine parametrization.
    (c) Recompute the components of the tangent vector u u u\boldsymbol{u}u, the derivative D u / d t D u / d t Du//dt\mathrm{D} \boldsymbol{u} / \mathrm{d} tDu/dt, and u D u / d t u D u / d t u*Du//dt\boldsymbol{u} \cdot \mathrm{D} \boldsymbol{u} / \mathrm{d} tuDu/dt using this new parametrization.
    (7.3) Consider the vector field v v v\boldsymbol{v}v in two-dimensional flat space with Cartesian components ( v x , v y ) = v x , v y = (v^(x),v^(y))=\left(v^{x}, v^{y}\right)=(vx,vy)= ( 0 , C x 0 , C x 0,Cx0, C x0,Cx ), with C C CCC a constant.
    (a) Compute the vector μ v μ v grad_(mu)v\boldsymbol{\nabla}_{\mu} \boldsymbol{v}μv for μ = x μ = x mu=x\mu=xμ=x and y y yyy.
    (b) Convert the components of the vector into cylindrical polar coordinates using the transformations
    from Chapter 3.
    (c) Using the connection coefficients given in the chapter, compute the vectors μ v μ v grad_(mu)v\boldsymbol{\nabla}_{\mu} \boldsymbol{v}μv for μ = r μ = r mu=r\mu=rμ=r and θ θ theta\thetaθ. (d) Treat the quantity ( μ v ) ν μ v ν (grad_(mu)v)^(nu)\left(\nabla_{\mu} \boldsymbol{v}\right)^{\nu}(μv)ν as components of a ( 1 , 1 ) ( 1 , 1 ) (1,1)(1,1)(1,1) tensor. Using the tensor transformation law, show that the results from part (c) are consistent with those of part (a).
    (7.4) The covariant derivative of a ( 0 , 2 ) ( 0 , 2 ) (0,2)(0,2)(0,2) tensor is written as
( α ξ ) μ ν = ξ μ ν x α Γ α μ λ ξ λ ν Γ α ν λ ξ μ λ α ξ μ ν = ξ μ ν x α Γ α μ λ ξ λ ν Γ α ν λ ξ μ λ (grad_(alpha)xi)_(mu nu)=(delxi_(mu nu))/(delx^(alpha))-Gamma_(alpha mu)^(lambda)xi_(lambda nu)-Gamma_(alpha nu)^(lambda)xi_(mu lambda)\left(\boldsymbol{\nabla}_{\alpha} \boldsymbol{\xi}\right)_{\mu \nu}=\frac{\partial \xi_{\mu \nu}}{\partial x^{\alpha}}-\Gamma_{\alpha \mu}^{\lambda} \xi_{\lambda \nu}-\Gamma_{\alpha \nu}^{\lambda} \xi_{\mu \lambda}(αξ)μν=ξμνxαΓαμλξλνΓανλξμλ
Apply this to the components of the metric tensor and show, using eqn 7.37 , that ( α g ) μ ν = 0 α g μ ν = 0 (grad_(alpha)g)_(mu nu)=0\left(\nabla_{\alpha} \boldsymbol{g}\right)_{\mu \nu}=0(αg)μν=0.

Free fall and geodesics

The bigger they come, the harder they fall Barbados Joe Walcott (1873-1935) and Bob Fitzsimmons (1863-1917)
A geodesic can be thought of geometrically as the straightest possible path in a curved spacetime. Geodesics are the paths that extremize the interval between spacetime events. Physically, a particle in free fall has a world line that follows a geodesic. 1 1 ^(1){ }^{1}1 The equation of motion for such a particle, known as the geodesic equation, therefore tells us about the motion of a particle that is not subject to external forces. In this chapter, we investigate geodesics and derive the geodesic equation that tells us how curvature causes particles to move. Our task here is to introduce some key ideas in geodesic motion. In the next chapter, we look at the details of how to extract the connection coefficients required to compute geodesics.

Example 8.1

In pre-relativity physics, a particle subject to no forces does not accelerate and has an equation of motion given by x ¨ = 0 x ¨ = 0 vec(x)^(¨)=0\ddot{\vec{x}}=0x¨=0. In relativity, a particle follows a path x μ ( τ ) = x μ ( τ ) = x^(mu)(tau)=x^{\mu}(\tau)=xμ(τ)= ( t ( τ ) , x ( τ ) , y ( τ ) , z ( τ ) ) ( t ( τ ) , x ( τ ) , y ( τ ) , z ( τ ) ) (t(tau),x(tau),y(tau),z(tau))(t(\tau), x(\tau), y(\tau), z(\tau))(t(τ),x(τ),y(τ),z(τ)) parametrized by some affine parameter, such as its proper time
τ τ tau\tauτ. If a particle is in a flat, Minkowski spacetime, we have an acceleration
d 2 x μ d τ 2 = 0 (flat spacetime). d 2 x μ d τ 2 = 0  (flat spacetime).  (d^(2)x^(mu))/(dtau^(2))=0quad" (flat spacetime). "\frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}=0 \quad \text { (flat spacetime). }d2xμdτ2=0 (flat spacetime). 
We can immediately write down the set of possible geodesics as
(8.2) x μ ( τ ) = b μ τ + c μ (8.2) x μ ( τ ) = b μ τ + c μ {:(8.2)x^(mu)(tau)=b^(mu)tau+c^(mu):}\begin{equation*} x^{\mu}(\tau)=b^{\mu} \tau+c^{\mu} \tag{8.2} \end{equation*}(8.2)xμ(τ)=bμτ+cμ
where b μ b μ b^(mu)b^{\mu}bμ and c μ c μ c^(mu)c^{\mu}cμ are a set of constant components. We see that in Minkowski spacetime, the particles fall along straight lines.

8.1 Extremal intervals

Hamilton's principle tells us that the action for the trajectory that is realized by a system, is the one that takes a stationary value when the action is varied. The action often takes a minimum value (e.g. for straight line motion in a two-dimensional plane), but examples of why and when it takes a saddle point or maximum value are less well known. However, they can be important in relativity.

8.1 Extremal intervals

Exercises

1 1 ^(1){ }^{1}1 One way to justify this is to recall from mechanics that the action for free, massive particles is given by S = m ( d s 2 ) 1 2 S = m d s 2 1 2 S=-m int(-ds^(2))^((1)/(2))S=-m \int\left(-\mathrm{d} s^{2}\right)^{\frac{1}{2}}S=m(ds2)12. The result of extremizing the action is the equations of motion for the particle. Since m m mmm in a scalar, extremizing the action amounts to extremizing the timelike interval Δ τ = ( d s 2 ) P 2 Δ τ = d s 2 P 2 Delta tau=int(-ds^(2))^((P)/(2))\Delta \tau=\int\left(-\mathrm{d} s^{2}\right)^{\frac{\mathrm{P}}{2}}Δτ=(ds2)P2 between events with timelike separation. This means that a solution of the equations of motion gives us a geodesic (which is timelike for a massive particle). It is worth stressing that it is only free particles that travel along geodesics. Particles subject to other interactions have additional terms in their Lagrangian and ditional terms ine ther and so give rise to equations of motion whose solutions are not the geodesics of spacetime. Although this argument applies to massive particles and their timelike geodesics, we can also discuss null and spacelike geodesics. To compute spacelike geodesics we extremize the proper length Δ l = ( + d s 2 ) 1 2 Δ l = + d s 2 1 2 Delta l=int(+ds^(2))^((1)/(2))\Delta l=\int\left(+\mathrm{d} s^{2}\right)^{\frac{1}{2}}Δl=(+ds2)12 between events with spacelike separation. Although nothing travels on spacelike geodesics (except hypothetical tachyon particles, which travel faster than light), spacelike geodesics are often interesting and useful. Photons travel along null geodesics and are discussed in Section 8.4.
2 2 ^(2){ }^{2}2 More generally, a conjugate point is a point where the matrix N i j 1 N i j 1 N_(ij)^(-1)N_{i j}^{-1}Nij1 is singular, and this occurs at q 1 q 1 q_(1)^(**)q_{1}^{*}q1 since a nonzero value of δ p j δ p j deltap^(j)\delta p^{j}δpj gives no variation in δ q i δ q i deltaq^(i)\delta q^{i}δqi. The rule is that the trajectory is a δ q i δ q i deltaq^(i)\delta q^{i}δqi. The rule is that the trajectory is a
minimum in the action if the trajectory minimum in the action if the trajectory
does not pass though a conjugate point. does not pass though a conjugate point.
This was the case for path (i). The trajectory is not a minimum in the action if the trajectory passes through a conjugate point, which is the case for path (ii). In general, if two geodesics are sent out from a point P P P\mathcal{P}P and later cross at a point Q Q Q\mathcal{Q}Q, then Q Q Q\mathcal{Q}Q is a conjugate point to P P P\mathcal{P}P. This argument has a very important use in singularity theorems, such as the one discussed in Chapter 50 .
Fig. 8.1 (a) Points q 1 q 1 q_(1)q_{1}q1 and q 2 q 2 q_(2)q_{2}q2 on the sphere. The conjugate point q 1 q 1 q_(1)^(**)q_{1}^{*}q1 is at the antipode of q 1 q 1 q_(1)q_{1}q1. Paths (i) and (ii), which lie on a great circle, are shown. (b) Setting off a swarm of particles at q 1 q 1 q_(1)q_{1}q1 results in their trajectories realizing a focus at q 1 q 1 q_(1)^(**)q_{1}^{*}q1.
Example 8.2
Consider non-relativistic motion between two points on the surface of a sphere, starting at q 1 q 1 q_(1)q_{1}q1 and ending at q 2 q 2 q_(2)q_{2}q2, as shown in Fig. 8.1(a). We assume that these points are not antipodal (i.e. not opposite points on the sphere). There are two geodesics: (i) a trajectory that represents the shortest distance between the points; and (ii) a trajectory that lies on the same great circle, but which heads off from q 1 q 1 q_(1)q_{1}q1 in the other direction, through its antipodal point and then to q 2 q 2 q_(2)q_{2}q2. For particles set off by an observer along these two paths with the same momentum, path (i) takes least time and gives the minimum action; path (ii) takes longer and gives a saddle point action. There is a way to use this example to find work out if the trajectory is a minimum or not. We set in motion a swarm of free particles from q 1 q 1 q_(1)q_{1}q1, almost along path (i), but with slightly different directions of their momentum, as shown in Fig. 8.1(b), The swarm spreads out initially, with each particle following its own geodesic. In general, the variation in initial momenta δ p j δ p j deltap^(j)\delta p^{j}δpj (where j j jjj labels the particle in question) will lead to variations in position δ q i δ q i deltaq^(i)\delta q^{i}δqi at some final time, given by a matrix equation will lead to variations in position δ q i δ q i deltaq^(i)\delta q^{i}δqi at some final time, given by a matrix equation
δ q i = N i j δ p j δ q i = N i j δ p j deltaq^(i)=N_(ij)deltap^(j)\delta q^{i}=N_{i j} \delta p^{j}δqi=Nijδpj. However, on the sphere, the result of this thought experiment is that all of the trajectories eventually collapse down and focus at the antipode of q 1 q 1 q_(1)q_{1}q1. This focal point at q 1 q 1 q_(1)^(**)q_{1}^{*}q1 is known as a conjugate point 2 2 ^(2){ }^{2}2 to q 1 q 1 q_(1)q_{1}q1. The trajectory that passes through q 1 q 1 q_(1)^(**)q_{1}^{*}q1 is the saddle point; the trajectory that does not is the minimum.
Turning now to geometry, the line element is given in terms of metric components by d s 2 = g μ ν d x μ d x ν d s 2 = g μ ν d x μ d x ν ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ds2=gμνdxμdxν. We shall be interested in the extremal value of the total interval s = | d s 2 | s = d s 2 s=intsqrt(|ds^(2)|)s=\int \sqrt{\left|\mathrm{d} s^{2}\right|}s=|ds2| for the path between two points. Whether this extremal interval represents a maximum or minimum interval now depends on the signature of the metric.

Example 8.3

With a Riemannian metric (that is, one with signature ++++ ) it is always possible to find arbitrarily long paths between two points, but the path length is bounded from below at a minimum value representing the shortest path between the two points. This path is a geodesic and represents the straightest possible curve in the space represented by the metric. (However, owing to the discussion in the last example, we cannot say that some given geodesic is necessarily the shortest distance between the two points.)
If we have a Lorentz metric (with signature -+++ ) then the interval between two points is positive if the interval is spacelike, negative if the interval is timelike and zero if the interval is null. It is always possible to find timelike curves with arbitrarily small intervals of proper time linking two points. If a curve of maximum proper time exists, it will be a timelike geodesic and represent the straightest path between the two points. This might seem the wrong way round, but follows from the minus sign in front of the timelike component in the metric. As a sanity check, we can confirm that a timelike geodesic does not minimize the length of a curve using a graphical method. Consider Fig. 8.2, showing a timelike curve approximated by a series of null paths. The timelike interval Δ τ = d τ = ( d s 2 ) 1 2 Δ τ = d τ = d s 2 1 2 Delta tau=intdtau=int(-ds^(2))^((1)/(2))\Delta \tau=\int \mathrm{d} \tau=\int\left(-\mathrm{d} s^{2}\right)^{\frac{1}{2}}Δτ=dτ=(ds2)12 is positive, but is infinitesimally close to a path formed from a series of null paths (Fig. 8.2) which, by definition, each have d s = 0 d s = 0 ds=0\mathrm{d} s=0ds=0, giving a vanishing total interval. It is therefore possible to make the timelike interval arbitrarily small.
We shall parametrize paths in spacetime using an affine parameter λ λ lambda\lambdaλ. To find the geodesic curve x μ ( λ ) x μ ( λ ) x^(mu)(lambda)x^{\mu}(\lambda)xμ(λ) that extremizes the interval s s sss between
two points, we split the interval into elements of length d s d s ds\mathrm{d} sds and write
(8.3) s = | d s 2 | = d λ | ( d s d λ ) 2 | 1 2 = d λ L ( x μ , x ˙ μ ) , (8.3) s = d s 2 = d λ d s d λ 2 1 2 = d λ L x μ , x ˙ μ , {:(8.3)s=intsqrt(|ds^(2)|)=intdlambda|(((d)s)/((d)lambda))^(2)|^((1)/(2))=intdlambda L(x^(mu),x^(˙)^(mu))",":}\begin{equation*} s=\int \sqrt{\left|\mathrm{d} s^{2}\right|}=\int \mathrm{d} \lambda\left|\left(\frac{\mathrm{~d} s}{\mathrm{~d} \lambda}\right)^{2}\right|^{\frac{1}{2}}=\int \mathrm{d} \lambda L\left(x^{\mu}, \dot{x}^{\mu}\right), \tag{8.3} \end{equation*}(8.3)s=|ds2|=dλ|( ds dλ)2|12=dλL(xμ,x˙μ),
which gives us a method for identifying the function L ( x μ , x ˙ μ ) L x μ , x ˙ μ L(x^(mu),x^(˙)^(mu))L\left(x^{\mu}, \dot{x}^{\mu}\right)L(xμ,x˙μ) in this problem as L = | ( d s / d λ ) 2 | 1 2 L = ( d s / d λ ) 2 1 2 L=|(ds//dlambda)^(2)|^((1)/(2))L=\left|(\mathrm{d} s / \mathrm{d} \lambda)^{2}\right|^{\frac{1}{2}}L=|(ds/dλ)2|12. This function must obey the EulerLagrange (EL) equation from Chapter 2, whose solution will allow us to identify the geodesic. Let's discuss a simple example.
Example 8.4
In Cartesian coordinates, in two dimensions we write the distance between points
d s 2 = d x 2 + d y 2 d s 2 = d x 2 + d y 2 ds^(2)=dx^(2)+dy^(2)\mathrm{d} s^{2}=\mathrm{d} x^{2}+\mathrm{d} y^{2}ds2=dx2+dy2
and so
(8.5) L = d s d λ = [ ( d x d λ ) 2 + ( d y d λ ) 2 ] 1 2 (8.5) L = d s d λ = d x d λ 2 + d y d λ 2 1 2 {:(8.5)L=(ds)/((d)lambda)=[((dx)/((d)lambda))^(2)+((dy)/((d)lambda))^(2)]^((1)/(2)):}\begin{equation*} L=\frac{\mathrm{d} s}{\mathrm{~d} \lambda}=\left[\left(\frac{\mathrm{d} x}{\mathrm{~d} \lambda}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{8.5} \end{equation*}(8.5)L=ds dλ=[(dx dλ)2+(dy dλ)2]12
The task is to find the shortest path between points in the plane (a spacelike geodesic). We know the answer: the path is, of course, a straight line. Applying the E-L equations for the variable x x xxx, we find
(8.6) L ( d x d λ ) = 1 L d x d λ , L x = 0 . (8.6) L d x d λ = 1 L d x d λ , L x = 0 . {:(8.6)(del L)/(del(((d)x)/((d)lambda)))=(1)/(L)((d)x)/((d)lambda)","quad(del L)/(del x)=0.:}\begin{equation*} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)}=\frac{1}{L} \frac{\mathrm{~d} x}{\mathrm{~d} \lambda}, \quad \frac{\partial L}{\partial x}=0 . \tag{8.6} \end{equation*}(8.6)L( dx dλ)=1L dx dλ,Lx=0.
Similar expressions are found for the variable y y yyy.
Before powering ahead to solve these equations, it's useful to take a closer look at the idea of parametrizing a path. Notice that the particular recipe for choosing λ λ lambda\lambdaλ along the curve hasn't been specified. Considering the interval from the previous example, we see that the choice of λ λ lambda\lambdaλ is arbitrary: simply scaling λ a λ + b λ a λ + b lambda rarr a lambda+b\lambda \rightarrow a \lambda+bλaλ+b has no effect on the action. 3 3 ^(3){ }^{3}3
We shall exploit the freedom to choose λ λ lambda\lambdaλ to make life easy for us. We vary the Lagrangian the first time, to find L / x L / x del L//del x\partial L / \partial xL/x and L / x ˙ L / x ˙ del L//delx^(˙)\partial L / \partial \dot{x}L/x˙, with an as-yet-unspecified parametrization λ λ lambda\lambdaλ. This tells us how the action changes with x x xxx and x ˙ = d x / d λ x ˙ = d x / d λ x^(˙)=dx//dlambda\dot{x}=\mathrm{d} x / \mathrm{d} \lambdax˙=dx/dλ. After this stage, the dependence of the interval on the parameter λ λ lambda\lambdaλ has been determined, but the precise choice of λ λ lambda\lambdaλ is still unspecified. 4 4 ^(4){ }^{4}4 Next, we make a choice of λ λ lambda\lambdaλ. The laboursaving parametrization to choose is called length parametrization. This choice is simply that d λ = d s d λ = d s dlambda=ds\mathrm{d} \lambda=\mathrm{d} sdλ=ds, which implies that the parameter λ λ lambda\lambdaλ simply measures the interval along the length of curve. The physical interpretation of this choice is that the parameter λ λ lambda\lambdaλ represents the proper time for timelike paths or proper length for spacelike paths. So for timelike paths, λ λ lambda\lambdaλ is the (proper) time measured by the observer in their locally flat spacetime as they fall along the geodesic. Length parametrization is therefore not just convenient, it is necessary to allow us to interpret s s sss as the interval between events in spacetime.
In the case of Cartesian coordinates discussed above, we would write d λ = d s = ( d x 2 + d y 2 ) 1 2 d λ = d s = d x 2 + d y 2 1 2 dlambda=ds=(dx^(2)+dy^(2))^((1)/(2))\mathrm{d} \lambda=\mathrm{d} s=\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}\right)^{\frac{1}{2}}dλ=ds=(dx2+dy2)12, which, inserted into the Lagrangian yields
(8.10) L = [ ( d x d λ ) 2 + ( d y d λ ) 2 ] 1 2 = 1 . (8.10) L = d x d λ 2 + d y d λ 2 1 2 = 1 . {:(8.10)L=[((dx)/((d)lambda))^(2)+((dy)/((d)lambda))^(2)]^((1)/(2))=1.:}\begin{equation*} L=\left[\left(\frac{\mathrm{d} x}{\mathrm{~d} \lambda}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}}=1 . \tag{8.10} \end{equation*}(8.10)L=[(dx dλ)2+(dy dλ)2]12=1.
Fig. 8.2 A smooth timelike curve can be represented as being approximated by a series of null paths. As the number of null paths is increased we get closer to the timelike curve. A timelike curve is in this sense infinitesimally close to a series of curves of zero length.
3 3 ^(3){ }^{3}3 If we have another parameter η η eta\etaη, such that λ = λ ( η ) λ = λ ( η ) lambda=lambda(eta)\lambda=\lambda(\eta)λ=λ(η), we write
(8.7) d x d λ = d x d η d η d λ (8.7) d x d λ = d x d η d η d λ {:(8.7)(dx)/((d)lambda)=(dx)/((d)eta)((d)eta)/((d)lambda):}\begin{equation*} \frac{\mathrm{d} x}{\mathrm{~d} \lambda}=\frac{\mathrm{d} x}{\mathrm{~d} \eta} \frac{\mathrm{~d} \eta}{\mathrm{~d} \lambda} \tag{8.7} \end{equation*}(8.7)dx dλ=dx dη dη dλ
and also the differential
(8.8) d λ = d λ d η d η (8.8) d λ = d λ d η d η {:(8.8)dlambda=(dlambda)/((d)eta)deta:}\begin{equation*} \mathrm{d} \lambda=\frac{\mathrm{d} \lambda}{\mathrm{~d} \eta} \mathrm{~d} \eta \tag{8.8} \end{equation*}(8.8)dλ=dλ dη dη
We see that the interval becomes
s = d λ [ ( d x d λ ) 2 + ( d y d λ ) 2 ] 1 2 = d λ d η d η [ ( d x d η d η d λ ) 2 + ( d y d η d η d λ ) 2 ] 1 2 = d η [ ( d x d η ) 2 + ( d y d η ) 2 ] 1 2 s = d λ d x d λ 2 + d y d λ 2 1 2 = d λ d η d η d x d η d η d λ 2 + d y d η d η d λ 2 1 2 = d η d x d η 2 + d y d η 2 1 2 {:[s= intdlambda[(((d)x)/((d)lambda))^(2)+((dy)/((d)lambda))^(2)]^((1)/(2))],[= int(dlambda)/((d)eta)deta[(((d)x)/((d)eta)((d)eta)/((d)lambda))^(2):}],[+((dy)/((d)eta)((d)eta)/((d)lambda))^(2)]^((1)/(2))],[= intdeta[(((d)x)/((d)eta))^(2)+((dy)/((d)eta))^(2)]^((1)/(2))]:}\begin{aligned} s= & \int \mathrm{d} \lambda\left[\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \\ = & \int \frac{\mathrm{d} \lambda}{\mathrm{~d} \eta} \mathrm{~d} \eta\left[\left(\frac{\mathrm{~d} x}{\mathrm{~d} \eta} \frac{\mathrm{~d} \eta}{\mathrm{~d} \lambda}\right)^{2}\right. \\ & \left.+\left(\frac{\mathrm{d} y}{\mathrm{~d} \eta} \frac{\mathrm{~d} \eta}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \\ = & \int \mathrm{d} \eta\left[\left(\frac{\mathrm{~d} x}{\mathrm{~d} \eta}\right)^{2}+\left(\frac{\mathrm{d} y}{\mathrm{~d} \eta}\right)^{2}\right]^{\frac{1}{2}} \end{aligned}s=dλ[( dx dλ)2+(dy dλ)2]12=dλ dη dη[( dx dη dη dλ)2+(dy dη dη dλ)2]12=dη[( dx dη)2+(dy dη)2]12
This implies that we can just as well use η η eta\etaη as λ λ lambda\lambdaλ and expect no change in the form of the action integral.
4 4 ^(4){ }^{4}4 One can think of λ λ lambda\lambdaλ as cancelling in the left-hand side of the E-L equation
(8.9) d d λ L ( d x d λ ) = L x (8.9) d d λ L d x d λ = L x {:(8.9)(d)/((d)lambda)(del L)/(del(((d)x)/((d)lambda)))=(del L)/(del x):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \lambda} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)}=\frac{\partial L}{\partial x} \tag{8.9} \end{equation*}(8.9)d dλL( dx dλ)=Lx
5 5 ^(5){ }^{5}5 The dot notation is used here to denote a derivative with respect to the parameter λ λ lambda\lambdaλ.
This does not imply that we are varying unity (with the inevitable result that the terms in the equation vanish): we have already taken a first set of derivatives and so the dependence on λ λ lambda\lambdaλ is now fixed. A consequence of the choice of parametrization is that, in addition to the equations of motion derived from the Euler-Lagrange equations, we also have an addition constraint equation given by, for our choice, by L = 1 L = 1 L=1L=1L=1. There is some redundancy here: the Euler-Lagrange equations give us enough information to find the extremal path. However, the extra equation frequently simplifies the algebra and is therefore a valuable help.
After this detour, let's now return to our simple example.

Example 8.5

Use our freedom to choose λ λ lambda\lambdaλ such that L = 1 L = 1 L=1L=1L=1 for the remainder of the calculation. We then have
(8.11) d d λ ( L ( d x d λ ) ) = d d λ ( 1 L d x d λ ) = d 2 x d λ 2 = 0 (8.11) d d λ L d x d λ = d d λ 1 L d x d λ = d 2 x d λ 2 = 0 {:(8.11)(d)/((d)lambda)((del L)/(del(((d)x)/((d)lambda))))=(d)/((d)lambda)*((1)/(L)((d)x)/((d)lambda))=(d^(2)x)/((d)lambda^(2))=0:}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \lambda}\left(\frac{\partial L}{\partial\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)}\right)=\frac{\mathrm{d}}{\mathrm{~d} \lambda} \cdot\left(\frac{1}{L} \frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)=\frac{\mathrm{d}^{2} x}{\mathrm{~d} \lambda^{2}}=0 \tag{8.11} \end{equation*}(8.11)d dλ(L( dx dλ))=d dλ(1L dx dλ)=d2x dλ2=0
and also, of course, d 2 y / d λ 2 = 0 d 2 y / d λ 2 = 0 d^(2)y//dlambda^(2)=0\mathrm{d}^{2} y / \mathrm{d} \lambda^{2}=0d2y/dλ2=0.
Since the double derivative of each of the coordinates is zero, the path must represent a straight line. We can check that these three equations are solved with the equation for a straight line
(8.12) y ( λ ) = m x ( λ ) + c (8.12) y ( λ ) = m x ( λ ) + c {:(8.12)y(lambda)=mx(lambda)+c:}\begin{equation*} y(\lambda)=m x(\lambda)+c \tag{8.12} \end{equation*}(8.12)y(λ)=mx(λ)+c
with m m mmm constant, by using the parametrization x = λ x = λ x=lambdax=\lambdax=λ and y = m λ + c y = m λ + c y=m lambda+cy=m \lambda+cy=mλ+c. We find
(8.13) d x d λ = 1 , d y d λ = m (8.13) d x d λ = 1 , d y d λ = m {:(8.13)(dx)/((d)lambda)=1","quad((d)y)/((d)lambda)=m:}\begin{equation*} \frac{\mathrm{d} x}{\mathrm{~d} \lambda}=1, \quad \frac{\mathrm{~d} y}{\mathrm{~d} \lambda}=m \tag{8.13} \end{equation*}(8.13)dx dλ=1, dy dλ=m
and so both of these equations will, of course, give zero if differentiated again with respect to λ λ lambda\lambdaλ.
Now for a more interesting example. We can use the same techniques to find the equations describing the geodesics on the surface of a unit sphere.

Example 8.6

The line element on the surface of a sphere with fixed radius r = 1 r = 1 r=1r=1r=1 is given by
(8.14) d s 2 = d θ 2 + sin 2 θ d ϕ 2 (8.14) d s 2 = d θ 2 + sin 2 θ d ϕ 2 {:(8.14)ds^(2)=dtheta^(2)+sin^(2)thetadphi^(2):}\begin{equation*} \mathrm{d} s^{2}=\mathrm{d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2} \tag{8.14} \end{equation*}(8.14)ds2=dθ2+sin2θ dϕ2
The parametrized interval is then given by
(8.15) s = d λ [ ( d θ d λ ) 2 + sin 2 θ ( d ϕ d λ ) 2 ] 1 2 (8.15) s = d λ d θ d λ 2 + sin 2 θ d ϕ d λ 2 1 2 {:(8.15)s=intdlambda[(((d)theta)/((d)lambda))^(2)+sin^(2)theta(((d)phi)/((d)lambda))^(2)]^((1)/(2)):}\begin{equation*} s=\int \mathrm{d} \lambda\left[\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}+\sin ^{2} \theta\left(\frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{8.15} \end{equation*}(8.15)s=dλ[( dθ dλ)2+sin2θ( dϕ dλ)2]12
Feeding the integrand into the E-L equations we obtain, at a first stage 5 5 ^(5){ }^{5}5
L θ = θ ˙ L , L ϕ = sin 2 θ ϕ ˙ L , L θ = θ ˙ L , L ϕ = sin 2 θ ϕ ˙ L , (del L)/(del theta)=((theta^(˙)))/(L),quad:.quad(del L)/(del phi)=sin^(2)theta((phi^(˙)))/(L),\frac{\partial L}{\partial \theta}=\frac{\dot{\theta}}{L}, \quad \therefore \quad \frac{\partial L}{\partial \phi}=\sin ^{2} \theta \frac{\dot{\phi}}{L},Lθ=θ˙L,Lϕ=sin2θϕ˙L,
L θ = sin θ cos θ ϕ ˙ 2 L , L ϕ = 0 L θ = sin θ cos θ ϕ ˙ 2 L , L ϕ = 0 (del L)/(del theta)=sin theta cos theta(phi^(˙)^(2))/(L),quad(del L)/(del phi)=0\frac{\partial L}{\partial \theta}=\sin \theta \cos \theta \frac{\dot{\phi}^{2}}{L}, \quad \frac{\partial L}{\partial \phi}=0Lθ=sinθcosθϕ˙2L,Lϕ=0
Invoking length parametrization, we set L = 1 L = 1 L=1L=1L=1 and we have that
(8.17) d d λ L θ ¨ = θ ¨ , d d λ L ϕ ˙ = 2 sin θ θ ˙ ϕ ˙ + sin 2 θ ϕ ¨ (8.17) d d λ L θ ¨ = θ ¨ , d d λ L ϕ ˙ = 2 sin θ θ ˙ ϕ ˙ + sin 2 θ ϕ ¨ {:(8.17)(d)/((d)lambda)(del L)/(del(theta^(¨)))=theta^(¨)","quad(d)/((d)lambda)(del L)/(del(phi^(˙)))=2sin thetatheta^(˙)phi^(˙)+sin^(2)thetaphi^(¨):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \lambda} \frac{\partial L}{\partial \ddot{\theta}}=\ddot{\theta}, \quad \frac{\mathrm{d}}{\mathrm{~d} \mathrm{\lambda}} \frac{\partial L}{\partial \dot{\phi}}=2 \sin \theta \dot{\theta} \dot{\phi}+\sin ^{2} \theta \ddot{\phi} \tag{8.17} \end{equation*}(8.17)d dλLθ¨=θ¨,d dλLϕ˙=2sinθθ˙ϕ˙+sin2θϕ¨
The equations of motion can then be arranged 6 6 ^(6){ }^{6}6
(8.19) d 2 θ d λ 2 sin θ cos θ ( d ϕ d λ ) 2 = 0 (8.20) d d λ ( sin 2 θ d ϕ d λ ) = 0 (8.19) d 2 θ d λ 2 sin θ cos θ d ϕ d λ 2 = 0 (8.20) d d λ sin 2 θ d ϕ d λ = 0 {:[(8.19)(d^(2)theta)/((d)lambda^(2))-sin theta cos theta((dphi)/((d)lambda))^(2)=0],[(8.20)((d))/((d)lambda)(sin^(2)theta((d)phi)/((d)lambda))=0]:}\begin{align*} \frac{\mathrm{d}^{2} \theta}{\mathrm{~d} \lambda^{2}}-\sin \theta \cos \theta\left(\frac{\mathrm{d} \phi}{\mathrm{~d} \lambda}\right)^{2} & =0 \tag{8.19}\\ \frac{\mathrm{~d}}{\mathrm{~d} \lambda}\left(\sin ^{2} \theta \frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda}\right) & =0 \tag{8.20} \end{align*}(8.19)d2θ dλ2sinθcosθ(dϕ dλ)2=0(8.20) d dλ(sin2θ dϕ dλ)=0
These equations are solved if ϕ ϕ phi\phiϕ is constant and θ θ theta\thetaθ increases linearly with λ λ lambda\lambdaλ (path A A AAA in Fig. 8.3). There is a similar solution where θ = π / 2 θ = π / 2 theta=pi//2\theta=\pi / 2θ=π/2 and ϕ ϕ phi\phiϕ increases linearly with λ λ lambda\lambdaλ (path B B BBB in Fig. 8.3). These solutions are familiar as the shortest distances between points on a sphere since they are arcs of great circles. Generally, θ = θ = theta=\theta=θ= const. is not a solution and so path C C CCC in Fig. 8.3, which has θ = π / 4 θ = π / 4 theta=pi//4\theta=\pi / 4θ=π/4, is not a solution. 7 7 ^(7){ }^{7}7
With some geodesics under our belt, we now turn to the more general problem of finding the geodesic representing path of a particle in free fall.

8.2 A geodesic equation

We define the covariant acceleration vector for a massive particle as a = D u / d τ a = D u / d τ a=Du//dtau\boldsymbol{a}=\mathrm{D} \boldsymbol{u} / \mathrm{d} \taua=Du/dτ, where u u u\boldsymbol{u}u is the particle's velocity and the proper time τ τ tau\tauτ parametrizes the world line. A particle in free fall follows a geodesic. It feels no force (by definition of free fall), so has no covariant acceleration, giving D u / d τ = 0 D u / d τ = 0 Du//dtau=0\mathrm{D} \boldsymbol{u} / \mathrm{d} \tau=0Du/dτ=0. Geometrically, the particle's velocity is tangent to its world line, so an equivalent expression is a = u u = 0 a = u u = 0 a=grad_(u)u=0\boldsymbol{a}=\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}=0a=uu=0. Recall that parallel transport of a vector v v v\boldsymbol{v}v along a path with tangent u u u\boldsymbol{u}u implies that the covariant derivative u v u v grad_(u)v\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}uv vanishes. So we now arrive at a geometrical definition that the geodesic is a path in spacetime that parallel transports its own tangent vector. Although we've discussed the dynamics of a particle, this geometric definition applies to any geodesic (e.g. a spacelike one) parametrized by an arbitrary affine parameter λ λ lambda\lambdaλ. Therefore, we can test if a curve with tangent vector u u u\boldsymbol{u}u is a geodesic using
(8.21) D u d λ = u u = 0 (8.21) D u d λ = u u = 0 {:(8.21)(Du)/((d)lambda)=grad_(u)u=0:}\begin{equation*} \frac{\mathrm{D} \boldsymbol{u}}{\mathrm{~d} \lambda}=\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}=0 \tag{8.21} \end{equation*}(8.21)Du dλ=uu=0
Using the expression from the previous chapter, we have an equation for components
(8.22) ( u u ) μ = u α ( u μ x α + u β Γ α β μ ) = 0 (8.22) u u μ = u α u μ x α + u β Γ α β μ = 0 {:(8.22)(grad_(u)u)^(mu)=u^(alpha)((delu^(mu))/(delx^(alpha))+u^(beta)Gamma_(alpha beta)^(mu))=0:}\begin{equation*} \left(\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}\right)^{\mu}=u^{\alpha}\left(\frac{\partial u^{\mu}}{\partial x^{\alpha}}+u^{\beta} \Gamma_{\alpha \beta}^{\mu}\right)=0 \tag{8.22} \end{equation*}(8.22)(uu)μ=uα(uμxα+uβΓαβμ)=0
We then have, using u α = d x α / d λ u α = d x α / d λ u^(alpha)=dx^(alpha)//dlambdau^{\alpha}=\mathrm{d} x^{\alpha} / \mathrm{d} \lambdauα=dxα/dλ, that
(8.23) d x α d λ u μ x α + d x α d λ d x β d λ Γ α β μ = 0 (8.23) d x α d λ u μ x α + d x α d λ d x β d λ Γ α β μ = 0 {:(8.23)(dx^(alpha))/(dlambda)(delu^(mu))/(delx^(alpha))+(dx^(alpha))/(dlambda)*((d)x^(beta))/(dlambda)Gamma_(alpha beta)^(mu)=0:}\begin{equation*} \frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \lambda} \frac{\partial u^{\mu}}{\partial x^{\alpha}}+\frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \lambda} \cdot \frac{\mathrm{~d} x^{\beta}}{\mathrm{d} \lambda} \Gamma_{\alpha \beta}^{\mu}=0 \tag{8.23} \end{equation*}(8.23)dxαdλuμxα+dxαdλ dxβdλΓαβμ=0
We notice that the first term can be written as d u μ / d λ d u μ / d λ du^(mu)//dlambda\mathrm{d} u^{\mu} / \mathrm{d} \lambdaduμ/dλ, which is simply the acceleration in the ordinary, flat, Cartesian system: the double derivative of x μ x μ x^(mu)x^{\mu}xμ with respect to λ λ lambda\lambdaλ. This allows us to write a differential equation for the path of a geodesic, known as the geodesic equation
(8.24) d 2 x μ d λ 2 + d x α d λ d x β d λ Γ α β μ = 0 (8.24) d 2 x μ d λ 2 + d x α d λ d x β d λ Γ α β μ = 0 {:(8.24)(d^(2)x^(mu))/(dlambda^(2))+(dx^(alpha))/(dlambda)((d)x^(beta))/(dlambda)Gamma_(alpha beta)^(mu)=0:}\begin{equation*} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \lambda^{2}}+\frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \lambda} \frac{\mathrm{~d} x^{\beta}}{\mathrm{d} \lambda} \Gamma_{\alpha \beta}^{\mu}=0 \tag{8.24} \end{equation*}(8.24)d2xμdλ2+dxαdλ dxβdλΓαβμ=0
6 6 ^(6){ }^{6}6 The latter equation can be expanded to read
(8.18) d 2 ϕ d λ 2 + 2 cos θ sin θ d θ d λ d ϕ d λ = 0 (8.18) d 2 ϕ d λ 2 + 2 cos θ sin θ d θ d λ d ϕ d λ = 0 {:(8.18)(d^(2)phi)/((d)lambda^(2))+2(cos theta)/(sin theta)((d)theta)/((d)lambda)*((d)phi)/((d)lambda)=0:}\begin{equation*} \frac{\mathrm{d}^{2} \phi}{\mathrm{~d} \lambda^{2}}+2 \frac{\cos \theta}{\sin \theta} \frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda} \cdot \frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda}=0 \tag{8.18} \end{equation*}(8.18)d2ϕ dλ2+2cosθsinθ dθ dλ dϕ dλ=0
7 7 ^(7){ }^{7}7 For example, set θ ( λ ) = λ θ ( λ ) = λ theta(lambda)=lambda\theta(\lambda)=\lambdaθ(λ)=λ and ϕ ( λ ) = ϕ ( λ ) = phi(lambda)=\phi(\lambda)=ϕ(λ)= 0 and the equations are solved. Similarly θ = π / 2 θ = π / 2 theta=pi//2\theta=\pi / 2θ=π/2 and ϕ = λ ϕ = λ phi=lambda\phi=\lambdaϕ=λ solve the equations. Setting θ = π / 4 θ = π / 4 theta=pi//4\theta=\pi / 4θ=π/4 and ϕ = λ ϕ = λ phi=lambda\phi=\lambdaϕ=λ does not solve the equations.
Fig. 8.3 Paths on the surface of a sphere. A A AAA (which runs from the North Pole to the equator) and B B BBB (which runs round the equator) are geodesics; C C CCC (dashed curve) is not.
8 8 ^(8){ }^{8}8 Keep in mind that instead of τ τ tau\tauτ, we are free to use any affine parameter λ λ lambda\lambdaλ related to τ τ tau\tauτ via λ = a τ + b λ = a τ + b lambda=a tau+b\lambda=a \tau+bλ=aτ+b, where a a aaa and b b bbb are constants. In fact, this provides us with another definition of an affine parameter: they are those parameters fo which the description of the world line has the form of the geodesic equation. Note that intervals for photons cannot be assigned a proper time (or length) since they travel along null geodesics. In terms of an affine parameter σ σ sigma\sigmaσ, the infinitesimal interval between two closely spaced events on a photon's world line is
(8.25) g μ ν d x μ d σ d x ν d σ = 0 (8.25) g μ ν d x μ d σ d x ν d σ = 0 {:(8.25)g_(mu nu)(dx^(mu))/(dsigma)*((d)x^(nu))/(dsigma)=0:}\begin{equation*} g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \sigma} \cdot \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \sigma}=0 \tag{8.25} \end{equation*}(8.25)gμνdxμdσ dxνdσ=0
where d x σ d x σ dx^(sigma)\mathrm{d} x^{\sigma}dxσ is the coordinate interval between the events. We discuss photons further at the end of this chapter.
Taking λ = τ λ = τ lambda=tau\lambda=\tauλ=τ, this is the equation of motion for a massive particle in curved spacetime in the absence of an external force. We can interpret the geodesic equation as telling us that particles that aren't subject to an external force freely fall along a geodesic, following what a local observer would interpret to be straight lines. 8 8 ^(8){ }^{8}8

Example 8.7

The expression for velocity u u u u u*u\boldsymbol{u} \cdot \boldsymbol{u}uu, involving the tangent vector of a geodesic u u u\boldsymbol{u}u, determines whether it is timelike ( u u = 1 ) ( u u = 1 ) (u*u=-1)(\boldsymbol{u} \cdot \boldsymbol{u}=-1)(uu=1) null ( = 0 ) ( = 0 ) (=0)(=0)(=0), or spacelike (where, if we take the affine parameter λ λ lambda\lambdaλ to be the proper length, we will have u u = 1 u u = 1 u*u=1\boldsymbol{u} \cdot \boldsymbol{u}=1uu=1 ). We can see how this quantity changes along a geodesic by evaluating
(8.26) u ( u u ) = 2 u ( u u ) = 0 (8.26) u ( u u ) = 2 u u u = 0 {:(8.26)grad_(u)(u*u)=2u*(grad_(u)u)=0:}\begin{equation*} \boldsymbol{\nabla}_{\boldsymbol{u}}(\boldsymbol{u} \cdot \boldsymbol{u})=2 \boldsymbol{u} \cdot\left(\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}\right)=0 \tag{8.26} \end{equation*}(8.26)u(uu)=2u(uu)=0
where the zero on the right-hand side of the last expression follows because u u = 0 u u = 0 grad_(u)u=0\nabla_{u} \boldsymbol{u}=0uu=0 for a geodesic. We conclude that a timelike tangent vector is always timelike along a timelike geodesic, and similarly for null and spacelike vectors.
The geodesic equation in its geometric version, u u = 0 u u = 0 grad_(u)u=0\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}=0uu=0, can also be used to motivate a geometric form of momentum conservation, which can be expressed as
(8.27) p p = 0 (8.27) p p = 0 {:(8.27)grad_(p)p=0:}\begin{equation*} \nabla_{p} p=0 \tag{8.27} \end{equation*}(8.27)pp=0
where p p p\boldsymbol{p}p is the momentum.
Example 8.8
Proof: For massive particles the components of p p p p grad_(p)p\boldsymbol{\nabla}_{\boldsymbol{p}} \boldsymbol{p}pp may be written as
( p p ) β = m u α ( p β , α + Γ β α σ p σ ) (8.28) = m ( d p β d τ + u α p σ Γ β α σ ) . p p β = m u α p β , α + Γ β α σ p σ (8.28) = m d p β d τ + u α p σ Γ β α σ . {:[(grad_(p)p)^(beta)=mu^(alpha)(p^(beta)_(,alpha)+Gamma^(beta)_(alpha sigma)p^(sigma))],[(8.28)=m(((d)p^(beta))/(dtau)+u^(alpha)p^(sigma)Gamma^(beta)_(alpha sigma)).]:}\begin{align*} \left(\boldsymbol{\nabla}_{p} \boldsymbol{p}\right)^{\beta} & =m u^{\alpha}\left(p^{\beta}{ }_{, \alpha}+\Gamma^{\beta}{ }_{\alpha \sigma} p^{\sigma}\right) \\ & =m\left(\frac{\mathrm{~d} p^{\beta}}{\mathrm{d} \tau}+u^{\alpha} p^{\sigma} \Gamma^{\beta}{ }_{\alpha \sigma}\right) . \tag{8.28} \end{align*}(pp)β=muα(pβ,α+Γβασpσ)(8.28)=m( dpβdτ+uαpσΓβασ).
We can always find a locally flat local inertial frame (LIF) coordinate system, where all of the Γ Γ Gamma\GammaΓ 's vanish at a point, and therefore flat-space momentum conservation (or d p d τ = 0 d p d τ = 0 (dp)/((d)tau)=0\frac{\mathrm{d} p}{\mathrm{~d} \tau}=0dp dτ=0 ) can be written as p p = 0 p p = 0 grad_(p)p=0\nabla_{p} \boldsymbol{p}=0pp=0. As this is a valid tensor equation in a flat d p d τ = 0 d p d τ = 0 (dp)/((d)tau)=0\frac{\mathrm{d} p}{\mathrm{~d} \tau}=0dp dτ=0 ) can be written as p p = 0 p p = 0 grad_(p)p=0\nabla_{p} \boldsymbol{p}=0pp=0. As this is a valid tensor equation in a flat
frame, by the principle of general covariance, the same equation must also be true in any frame. 9 9 ^(9){ }^{9}9

8.3 Inertial forces

The geodesic equation tells us that if spacetime gives non-zero connection coefficients Γ μ α β Γ μ α β Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}Γμαβ, we observe an acceleration, even in the absence of an external force. This acceleration could be due to spacetime being curved, or simply to our choice of coordinates. We are used to accelerations resulting from forces and so we interpret the acceleration x ¨ μ x ¨ μ x^(¨)^(mu)\ddot{x}^{\mu}x¨μ that results from the connections as corresponding to an inertial force. The inertial force on particle of mass m m mmm is given by the geodesic equation as
(8.29) f inertial μ = m x ¨ μ = m Γ α λ μ x ˙ α x ˙ λ (8.29) f inertial  μ = m x ¨ μ = m Γ α λ μ x ˙ α x ˙ λ {:(8.29)f_("inertial ")^(mu)=mx^(¨)^(mu)=-mGamma_(alpha lambda)^(mu)x^(˙)^(alpha)x^(˙)^(lambda):}\begin{equation*} f_{\text {inertial }}^{\mu}=m \ddot{x}^{\mu}=-m \Gamma_{\alpha \lambda}^{\mu} \dot{x}^{\alpha} \dot{x}^{\lambda} \tag{8.29} \end{equation*}(8.29)finertial μ=mx¨μ=mΓαλμx˙αx˙λ
where dot notation means x ˙ μ = d x μ / d τ x ˙ μ = d x μ / d τ x^(˙)^(mu)=dx^(mu)//dtau\dot{x}^{\mu}=\mathrm{d} x^{\mu} / \mathrm{d} \taux˙μ=dxμ/dτ.
Example 8.9
In classical mechanics, if you move in an accelerating frame of reference such as the rotating earth, you feel an inertial force. Although these are sometimes known as fictional forces, there is little fictional about them to the person experiencing them.
Consider the game swingball 10 10 ^(10){ }^{10}10 where a tennis ball is connected by a string to a post (as shown in Fig. 8.4) and executes circular motion in the horizontal plane. In the frame of the players, the ball, with mass m m mmm accelerates as it swings around the pole. Referring to the figure, the tension T T TTT in the string supplies a vertical component T cos θ = m g T cos θ = m g T cos theta=mgT \cos \theta=m gTcosθ=mg that balances the gravitational force. The horizontal component T sin θ T sin θ T sin thetaT \sin \thetaTsinθ is not cancelled, but instead supplies the centripetal force that maintains the circular motion, equal to m v 2 / r m v 2 / r mv^(2)//rm v^{2} / rmv2/r, where v v vvv is the velocity and r r rrr the radius of the circular motion. From the local frame of the tennis ball the picture is rather different. The
must be balanced by a new force F F vec(F)\vec{F}F. This is the centrifugal force: an outwarddirected force felt by the ball as a real force, but not evident in the players' frame.
The principle of equivalence teaches us that the acceleration due to gravitation, for a single particle, is indistinguishable from the acceleration due to a particular choice of coordinates. Since free particles fall along geodesics, the geodesic equation says that acceleration a μ a μ a^(mu)a^{\mu}aμ is given by x ¨ μ = x ˙ α x ˙ λ Γ μ α λ x ¨ μ = x ˙ α x ˙ λ Γ μ α λ x^(¨)^(mu)=-x^(˙)^(alpha)x^(˙)^(lambda)Gamma^(mu)_(alpha lambda)\ddot{x}^{\mu}=-\dot{x}^{\alpha} \dot{x}^{\lambda} \Gamma^{\mu}{ }_{\alpha \lambda}x¨μ=x˙αx˙λΓμαλ, which is to say that gravitation gives rise to an inertial force.
Example 8.10
Consider the metric that describes a weak gravitational field
(8.30) d s 2 = [ 1 + 2 Φ ( x , y , z ) ] d t 2 + ( d x 2 + d y 2 + d z 2 ) (8.30) d s 2 = [ 1 + 2 Φ ( x , y , z ) ] d t 2 + d x 2 + d y 2 + d z 2 {:(8.30)ds^(2)=-[1+2Phi(x","y","z)]dt^(2)+(dx^(2)+dy^(2)+dz^(2)):}\begin{equation*} \mathrm{d} s^{2}=-[1+2 \Phi(x, y, z)] \mathrm{d} t^{2}+\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}\right) \tag{8.30} \end{equation*}(8.30)ds2=[1+2Φ(x,y,z)]dt2+(dx2+dy2+dz2)
where Φ ( x , y , z ) = G M / ( x 2 + y 2 + z 2 ) 1 2 Φ ( x , y , z ) = G M / x 2 + y 2 + z 2 1 2 Phi(x,y,z)=-GM//(x^(2)+y^(2)+z^(2))^((1)/(2))\Phi(x, y, z)=-G M /\left(x^{2}+y^{2}+z^{2}\right)^{\frac{1}{2}}Φ(x,y,z)=GM/(x2+y2+z2)12 is the gravitational potential. This can be used to gives an equation of motion 11 11 ^(11){ }^{11}11
(8.31) d 2 x i d τ 2 + Φ x i ( d t d τ ) 2 = 0 (8.31) d 2 x i d τ 2 + Φ x i d t d τ 2 = 0 {:(8.31)(d^(2)x^(i))/((d)tau^(2))*+(del Phi)/(delx^(i))*(((d)t)/((d)tau))^(2)=0:}\begin{equation*} \frac{\mathrm{d}^{2} x^{i}}{\mathrm{~d} \tau^{2}} \cdot+\frac{\partial \Phi}{\partial x^{i}} \cdot\left(\frac{\mathrm{~d} t}{\mathrm{~d} \tau}\right)^{2}=0 \tag{8.31} \end{equation*}(8.31)d2xi dτ2+Φxi( dt dτ)2=0
where x i = ( x , y , z ) x i = ( x , y , z ) x^(i)=(x,y,z)x^{i}=(x, y, z)xi=(x,y,z). Comparing this to the geodesic equation, we read off the (only non-zero) connection coefficients are
(8.32) Γ i t t = Φ x i = G M x i r 3 , (8.32) Γ i t t = Φ x i = G M x i r 3 , {:(8.32)Gamma^(i)_(tt)=(del Phi)/(delx^(i))=GM(x^(i))/(r^(3))",":}\begin{equation*} \Gamma^{i}{ }_{t t}=\frac{\partial \Phi}{\partial x^{i}}=G M \frac{x^{i}}{r^{3}}, \tag{8.32} \end{equation*}(8.32)Γitt=Φxi=GMxir3,
where r 2 = x 2 + y 2 + z 2 r 2 = x 2 + y 2 + z 2 r^(2)=x^(2)+y^(2)+z^(2)r^{2}=x^{2}+y^{2}+z^{2}r2=x2+y2+z2. In a non-relativistic limit, we have τ t τ t tau~~t\tau \approx tτt. The connection therefore supplies the components of the gravitational force Γ t t i Γ t t i -Gamma_(tt)^(i)-\Gamma_{t t}^{i}Γtti in the geodesic equation.
If a particle is subject to an externally applied 4 -force f f f\boldsymbol{f}f with components f μ f μ f^(mu)f^{\mu}fμ, then this supplies a non-zero right-hand side of the geodesic equation and we have, in components,
(8.33) m ( d 2 x μ d τ 2 + d x α d τ d x λ d τ Γ α λ μ ) = f μ (8.33) m d 2 x μ d τ 2 + d x α d τ d x λ d τ Γ α λ μ = f μ {:(8.33)m((d^(2)x^(mu))/(dtau^(2))+(dx^(alpha))/(dtau)*((d)x^(lambda))/(dtau)*Gamma_(alpha lambda)^(mu))=f^(mu):}\begin{equation*} m\left(\frac{\mathrm{~d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}+\frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \tau} \cdot \frac{\mathrm{~d} x^{\lambda}}{\mathrm{d} \tau} \cdot \Gamma_{\alpha \lambda}^{\mu}\right)=f^{\mu} \tag{8.33} \end{equation*}(8.33)m( d2xμdτ2+dxαdτ dxλdτΓαλμ)=fμ
10 10 ^(10){ }^{10}10 Swingball is more usually known as tetherball outside Britain.
Fig. 8.4 Swingball in (a) the frame of the players and (b) the frame of the ball.
11 11 ^(11){ }^{11}11 See the next chapter for details of the method.
Example 8.11
We can use the connection coefficients from the last chapter for motion in cylindrical coordinates to derive equations of motion using the geodesic equation. In the noncoordinates to derive equations of motion
relativistic limit, we take τ t τ t tau~~t\tau \approx tτt and write 12 12 ^(12){ }^{12}12
(8.34) d 2 x μ d t 2 + Γ α β μ d x α d t d x β d t = f μ m (8.34) d 2 x μ d t 2 + Γ α β μ d x α d t d x β d t = f μ m {:(8.34)(d^(2)x^(mu))/(dt^(2))+Gamma_(alpha beta)^(mu)(dx^(alpha))/(dt)((d)x^(beta))/(dt)=(f^(mu))/(m):}\begin{equation*} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} t^{2}}+\Gamma_{\alpha \beta}^{\mu} \frac{\mathrm{d} x^{\alpha}}{\mathrm{d} t} \frac{\mathrm{~d} x^{\beta}}{\mathrm{d} t}=\frac{f^{\mu}}{m} \tag{8.34} \end{equation*}(8.34)d2xμdt2+Γαβμdxαdt dxβdt=fμm
which gives us two equations. The radial part is
f r / m = d 2 r d t 2 + Γ α β r u α u β (8.35) = a r + Γ θ θ r u θ u θ = a r r ( u θ ) 2 , f r / m = d 2 r d t 2 + Γ α β r u α u β (8.35) = a r + Γ θ θ r u θ u θ = a r r u θ 2 , {:[f^(r)//m=(d^(2)r)/((d)t^(2))+Gamma_(alpha beta)^(r)u^(alpha)u^(beta)],[(8.35)=a^(r)+Gamma_(theta theta)^(r)u^(theta)u^(theta)=a^(r)-r(u^(theta))^(2)","]:}\begin{align*} f^{r} / m & =\frac{\mathrm{d}^{2} r}{\mathrm{~d} t^{2}}+\Gamma_{\alpha \beta}^{r} u^{\alpha} u^{\beta} \\ & =a^{r}+\Gamma_{\theta \theta}^{r} u^{\theta} u^{\theta}=a^{r}-r\left(u^{\theta}\right)^{2}, \tag{8.35} \end{align*}fr/m=d2r dt2+Γαβruαuβ(8.35)=ar+Γθθruθuθ=arr(uθ)2,
where the angular velocity u θ = θ ˙ u θ = θ ˙ u_(theta)=theta^(˙)u_{\theta}=\dot{\theta}uθ=θ˙. The angular part is
f θ / m = d 2 θ d t 2 + Γ α β θ u α u β (8.36) = a θ + 2 Γ r θ θ r u r u θ = a θ + 2 u r u θ r . f θ / m = d 2 θ d t 2 + Γ α β θ u α u β (8.36) = a θ + 2 Γ r θ θ r u r u θ = a θ + 2 u r u θ r . {:[f^(theta)//m=(d^(2)theta)/((d)t^(2))+Gamma_(alpha beta)^(theta)u^(alpha)u^(beta)],[(8.36)=a^(theta)+2Gamma_(r theta)^(theta)_(r)u^(r)u^(theta)=a^(theta)+(2u^(r)u^(theta))/(r).]:}\begin{align*} f^{\theta} / m & =\frac{\mathrm{d}^{2} \theta}{\mathrm{~d} t^{2}}+\Gamma_{\alpha \beta}^{\theta} u^{\alpha} u^{\beta} \\ & =a^{\theta}+2 \Gamma_{r \theta}^{\theta}{ }_{r} u^{r} u^{\theta}=a^{\theta}+\frac{2 u^{r} u^{\theta}}{r} . \tag{8.36} \end{align*}fθ/m=d2θ dt2+Γαβθuαuβ(8.36)=aθ+2Γrθθruruθ=aθ+2uruθr.
The trajectory of a particle following uniform circular motion in flat spacetime is not a geodesic: the geodesics are straight lines. From eqn 8.35 we see that if a particle is undergoing uniform circular motion in this coordinate system we have a r = 0 a r = 0 a^(r)=0a^{r}=0ar=0 and so an inward-directed external force f r f r f^(r)f^{r}fr (the centripetal force) must be applied in order an inward-directed external force f r f r f^(r)f^{r}fr (the centripetal force) must be applied in order
to balance the inertial force m r ( u θ ) 2 m r u θ 2 mr(u^(theta))^(2)m r\left(u^{\theta}\right)^{2}mr(uθ)2. If we want the circular motion to be uniform to balance the inertial force m r ( u θ ) 2 m r u θ 2 mr(u^(theta))^(2)m r\left(u^{\theta}\right)^{2}mr(uθ)2. If we want the circular motion to be uniform
with a θ = 0 a θ = 0 a^(theta)=0a^{\theta}=0aθ=0, then having u r = 0 u r = 0 u^(r)=0u^{r}=0ur=0 guarantees no component of force is needed in the with a θ = 0 a θ = 0 a^(theta)=0a^{\theta}=0aθ=0,
θ θ theta\thetaθ direction.

8.4 Geodesics for photons

One of the most striking predictions of general relativity is that curved spacetime affects the motion of light. Photons, the particles of light, are massless and in the absence of interactions, fall along null geodesics. 13 13 ^(13){ }^{13}13 At each point along null geodesic a light cone will be tangent to the curve, as shown in Fig. 8.5. A simply way to analyse the paths of light is, therefore, to consider directly the constraint d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0 for photons, as demonstrated in the following example.
Example 8.12
Consider a light ray travelling in a weak, Newtonian gravitational field. Earlier we used the line element in eqn 8.30 to describe the geometry. Recall that the more accurate expression, which includes the correction to the spacelike parts, is given accura
by 14 14 ^(14)^{14}14
d s 2 = [ 1 + 2 Φ ( x , y , z ) ] d t 2 + [ 1 2 Φ ( x , y , z ) ] ( d x 2 + d y 2 + d z 2 ) d s 2 = [ 1 + 2 Φ ( x , y , z ) ] d t 2 + [ 1 2 Φ ( x , y , z ) ] d x 2 + d y 2 + d z 2 ds^(2)=-[1+2Phi(x,y,z)]dt^(2)+[1-2Phi(x,y,z)](dx^(2)+dy^(2)+dz^(2))\mathrm{d} s^{2}=-[1+2 \Phi(x, y, z)] \mathrm{d} t^{2}+[1-2 \Phi(x, y, z)]\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}\right)ds2=[1+2Φ(x,y,z)]dt2+[12Φ(x,y,z)](dx2+dy2+dz2)
where Φ ( x , y , z ) = G M / r Φ ( x , y , z ) = G M / r Phi(x,y,z)=-GM//r\Phi(x, y, z)=-G M / rΦ(x,y,z)=GM/r. Setting d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0 for photons, we find an expression for the null geodesics in terms of the coordinates r r rrr and t t ttt which is
(8.38) d r d t = ( r + G M r G M ) 1 2 (8.38) d r d t = r + G M r G M 1 2 {:(8.38)(dr)/((d)t)=((r+GM)/(r-GM))^((1)/(2)):}\begin{equation*} \frac{\mathrm{d} r}{\mathrm{~d} t}=\left(\frac{r+G M}{r-G M}\right)^{\frac{1}{2}} \tag{8.38} \end{equation*}(8.38)dr dt=(r+GMrGM)12
which, as we saw in Chapter 5, can be used to investigate the light cone structure of this geometry
This expression can also be used to demonstrate a famous relativistic effect known as Shapiro time delay. A light pulse is sent from a distant planet to the Earth, passing close to the Sun, with a distance of closest approach of b b bbb, as shown in Fig. 8.6. If we imagine light travelling between two coordinate points ( d p , y 0 , z 0 ) d p , y 0 , z 0 (-d_(p),y_(0),z_(0))\left(-d_{\mathrm{p}}, y_{0}, z_{0}\right)(dp,y0,z0) and ( d e , y 0 , z 0 ) d e , y 0 , z 0 (d_(e),y_(0),z_(0))\left(d_{\mathrm{e}}, y_{0}, z_{0}\right)(de,y0,z0) along the x x xxx-direction, then d r = d x d r = d x dr=dx\mathrm{d} r=\mathrm{d} xdr=dx. Expanding the equation describing the geodesic in the limit of small 15 G M / r 15 G M / r ^(15)GM//r{ }^{15} G M / r15GM/r we find the coordinate time elapsed between the events is
(8.39) t = d x ( 1 + 2 G M r ) = d x [ 1 + 2 G M ( x 2 + y 2 + z 2 ) 1 2 ] . (8.39) t = d x 1 + 2 G M r = d x 1 + 2 G M x 2 + y 2 + z 2 1 2 . {:(8.39)t=intdx(1+(2GM)/(r))=intdx[1+(2GM)/((x^(2)+y^(2)+z^(2))^((1)/(2)))].:}\begin{equation*} t=\int \mathrm{d} x\left(1+\frac{2 G M}{r}\right)=\int \mathrm{d} x\left[1+\frac{2 G M}{\left(x^{2}+y^{2}+z^{2}\right)^{\frac{1}{2}}}\right] . \tag{8.39} \end{equation*}(8.39)t=dx(1+2GMr)=dx[1+2GM(x2+y2+z2)12].
For a path from x = 0 x = 0 x=0x=0x=0 to x = x 0 x = x 0 x=x_(0)x=x_{0}x=x0, we can integrate 16 16 ^(16){ }^{16}16 to find, for small b / x 0 b / x 0 b//x_(0)b / x_{0}b/x0,
(8.40) t x 0 + 2 G M ln 2 | x 0 | b , (8.40) t x 0 + 2 G M ln 2 x 0 b , {:(8.40)t~~x_(0)+2GM ln((2|x_(0)|)/(b))",":}\begin{equation*} t \approx x_{0}+2 G M \ln \frac{2\left|x_{0}\right|}{b}, \tag{8.40} \end{equation*}(8.40)tx0+2GMln2|x0|b,
to leading order. The time taken by the light pulse is the usual time t = x 0 / c t = x 0 / c t=x_(0)//ct=x_{0} / ct=x0/c plus a relativistic delay, caused by the gravitational field of the Sun. For the geometry in the figure, in which the delay for each leg of the journey adds, we find a total relativistic time delay of
(8.41) Δ t = 2 G M ln 4 d p d e b 2 (8.41) Δ t = 2 G M ln 4 d p d e b 2 {:(8.41)Delta t=2GM ln((4d_(p)d_(e))/(b^(2))):}\begin{equation*} \Delta t=2 G M \ln \frac{4 d_{\mathrm{p}} d_{\mathrm{e}}}{b^{2}} \tag{8.41} \end{equation*}(8.41)Δt=2GMln4dpdeb2

Chapter summary

  • Geodesics are the paths that free particles fall along in general relativity. They can be found by extremizing the path between two events using the calculus of variations, which is much simplified if length parametrization is used. Free massive particles follow timelike geodesics.
  • The equation of motion
(8.42) m ( d 2 x μ d τ 2 + Γ μ α β d x α d τ d x β d τ ) = f μ (8.42) m d 2 x μ d τ 2 + Γ μ α β d x α d τ d x β d τ = f μ {:(8.42)m((d^(2)x^(mu))/(dtau^(2))+Gamma^(mu)_(alpha beta)(dx^(alpha))/(dtau)*((d)x^(beta))/(dtau))=f^(mu):}\begin{equation*} m\left(\frac{\mathrm{~d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}+\Gamma^{\mu}{ }_{\alpha \beta} \frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \tau} \cdot \frac{\mathrm{~d} x^{\beta}}{\mathrm{d} \tau}\right)=f^{\mu} \tag{8.42} \end{equation*}(8.42)m( d2xμdτ2+Γμαβdxαdτ dxβdτ)=fμ
relates the acceleration to the geometry and any external forces. The term that includes the connection coefficients gives rise to inertial forces. When f μ = 0 f μ = 0 f^(mu)=0f^{\mu}=0fμ=0 the equation describes a geodesic.
  • Light travels along null geodesics and this is easiest to analyse using the null condition d s 2 = 0 d s 2 = 0 ds^(2)=0\mathrm{d} s^{2}=0ds2=0.
Irwin Shapiro (1929- ) suggested this as a fourth test of general relativity. The other three so-called classical solarsystem tests of general relativity are: (i) the perihelion precession of Mercury; (ii) the deflection of light by the Sun; (both described in Chapter 23) (iii) the Gravitational redshift of light (Chapter 13).
15 15 ^(15){ }^{15}15 Restoring factors of c c ccc this is the limit of small G M / c 2 r G M / c 2 r GM//c^(2)rG M / c^{2} rGM/c2r.
16 16 ^(16){ }^{16}16 Use the result
d x x 2 + a 2 = ln ( x + x 2 + a 2 ) d x x 2 + a 2 = ln x + x 2 + a 2 int(dx)/(sqrt(x^(2)+a^(2)))=ln(x+sqrt(x^(2)+a^(2)))\int \frac{\mathrm{d} x}{\sqrt{x^{2}+a^{2}}}=\ln \left(x+\sqrt{x^{2}+a^{2}}\right)dxx2+a2=ln(x+x2+a2).

d p d p d_(p)d_{\mathrm{p}}dp
p in p  in  p^(" in ")\stackrel{\text { in }}{\mathrm{p}}p in 
Fig. 8.6 The geometry for the Shapiro time delay, showing the planet (p), and the Earth (e), with the Sun (S) at the origin.

Exercises

(8.1) We will find the shortest distance between two points in flat space, expressed in cylindrical polar
coordinates, where interval is written as
(8.43) s = ( d r 2 + r 2 d θ 2 ) 1 2 . (8.43) s = d r 2 + r 2 d θ 2 1 2 . {:(8.43)s=int(dr^(2)+r^(2)(d)theta^(2))^((1)/(2)).:}\begin{equation*} s=\int\left(\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}\right)^{\frac{1}{2}} . \tag{8.43} \end{equation*}(8.43)s=(dr2+r2 dθ2)12.
(a) Show that the equations of motion are
(8.44) d 2 r d λ 2 = r ( d θ d λ ) 2 , d d λ ( r 2 d θ d λ ) = 0 . (8.44) d 2 r d λ 2 = r d θ d λ 2 , d d λ r 2 d θ d λ = 0 . {:(8.44)(d^(2)r)/((d)lambda^(2))=r(((d)theta)/((d)lambda))^(2)","quad((d))/((d)lambda)(r^(2)((d)theta)/((d)lambda))=0.:}\begin{equation*} \frac{\mathrm{d}^{2} r}{\mathrm{~d} \lambda^{2}}=r\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \mathrm{\lambda}}\right)^{2}, \quad \frac{\mathrm{~d}}{\mathrm{~d} \lambda}\left(r^{2} \frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)=0 . \tag{8.44} \end{equation*}(8.44)d2r dλ2=r( dθ dλ)2, d dλ(r2 dθ dλ)=0.
(b) Using the equations of motion, show that the equation for the geodesic obeys
(8.45) r 2 = λ 2 + a 2 (8.45) r 2 = λ 2 + a 2 {:(8.45)r^(2)=lambda^(2)+a^(2):}\begin{equation*} r^{2}=\lambda^{2}+a^{2} \tag{8.45} \end{equation*}(8.45)r2=λ2+a2
and
(8.46) d θ d λ = a λ 2 + a 2 (8.46) d θ d λ = a λ 2 + a 2 {:(8.46)(dtheta)/((d)lambda)=(a)/(lambda^(2)+a^(2)):}\begin{equation*} \frac{\mathrm{d} \theta}{\mathrm{~d} \lambda}=\frac{a}{\lambda^{2}+a^{2}} \tag{8.46} \end{equation*}(8.46)dθ dλ=aλ2+a2
leading to a solution
(8.47) a tan ( θ θ 0 ) = λ (8.47) a tan θ θ 0 = λ {:(8.47)a tan(theta-theta_(0))=lambda:}\begin{equation*} a \tan \left(\theta-\theta_{0}\right)=\lambda \tag{8.47} \end{equation*}(8.47)atan(θθ0)=λ
(8.2) Despite its complicated appearance, the solution in the previous problem does represent a straight line expressed in cylindrical coordinates. We can show this by referring to Fig. 8.7, which shows how a a aaa and θ 0 θ 0 theta_(0)\theta_{0}θ0 should be interpreted.
(a) Eliminate λ λ lambda\lambdaλ to show
(8.48) r = a cos ( θ θ 0 ) . (8.48) r = a cos θ θ 0 . {:(8.48)r=(a)/(cos(theta-theta_(0))).:}\begin{equation*} r=\frac{a}{\cos \left(\theta-\theta_{0}\right)} . \tag{8.48} \end{equation*}(8.48)r=acos(θθ0).
Fig. 8.7 The geometry of a straight line in polar coordinates.
(b) In Cartesian coordinates, a straight line can be written as α x + β y = γ α x + β y = γ alpha x+beta y=gamma\alpha x+\beta y=\gammaαx+βy=γ. Using the substitutions
a = γ ( α 2 + β 2 ) 1 2 , cos θ 0 = α ( α 2 + β 2 ) 1 2 , sin θ 0 = ( α 2 + β 2 ) 1 2 a = γ α 2 + β 2 1 2 , cos θ 0 = α α 2 + β 2 1 2 , sin θ 0 = α 2 + β 2 1 2 {:[a=(gamma)/((alpha^(2)+beta^(2))^((1)/(2)))","quad cos theta_(0)=(alpha)/((alpha^(2)+beta^(2))^((1)/(2)))","],[sin theta_(0)=((alpha^(2)+beta^(2))^((1)/(2)))/()]:}\begin{gathered} a=\frac{\gamma}{\left(\alpha^{2}+\beta^{2}\right)^{\frac{1}{2}}}, \quad \cos \theta_{0}=\frac{\alpha}{\left(\alpha^{2}+\beta^{2}\right)^{\frac{1}{2}}}, \\ \sin \theta_{0}=\frac{\left(\alpha^{2}+\beta^{2}\right)^{\frac{1}{2}}}{} \end{gathered}a=γ(α2+β2)12,cosθ0=α(α2+β2)12,sinθ0=(α2+β2)12
show that eqn 8.48 is, indeed, a description of a straight line.
(8.3) Suppose we did not know about spacetime curvature nor the details of geometry. We would still need to use the geodesic equation, as we shall demonstrate. Consider a particle which is accelerating. In a frame that moves along with the particle, its position is ξ μ ξ μ xi^(mu)\xi^{\mu}ξμ, the particle can't feel its own weight, which is to say that no forces act on it and it undergoes no acceleration. As a result, in this frame, d 2 ξ μ / d τ 2 = 0 d 2 ξ μ / d τ 2 = 0 d^(2)xi^(mu)//dtau^(2)=0d^{2} \xi^{\mu} / d \tau^{2}=0d2ξμ/dτ2=0. Now consider how this particle's trajectory appears in some other frame with coordinates x ν x ν x^(nu)x^{\nu}xν. By using the chain rule, show
(8.49) d 2 x ν d τ 2 + x ν ξ μ 2 ξ μ x λ x σ d x λ d τ d x σ d τ = 0 (8.49) d 2 x ν d τ 2 + x ν ξ μ 2 ξ μ x λ x σ d x λ d τ d x σ d τ = 0 {:(8.49)(d^(2)x^(nu))/(dtau^(2))+(delx^(nu))/(delxi^(mu))(del^(2)xi^(mu))/(delx^(lambda)delx^(sigma))(dx^(lambda))/(dtau)((d)x^(sigma))/(dtau)=0:}\begin{equation*} \frac{\mathrm{d}^{2} x^{\nu}}{\mathrm{d} \tau^{2}}+\frac{\partial x^{\nu}}{\partial \xi^{\mu}} \frac{\partial^{2} \xi^{\mu}}{\partial x^{\lambda} \partial x^{\sigma}} \frac{\mathrm{d} x^{\lambda}}{\mathrm{d} \tau} \frac{\mathrm{~d} x^{\sigma}}{\mathrm{d} \tau}=0 \tag{8.49} \end{equation*}(8.49)d2xνdτ2+xνξμ2ξμxλxσdxλdτ dxσdτ=0
and interpret this equation.
(8.4) The covariant derivative of a 1-form σ ~ σ ~ tilde(sigma)\tilde{\boldsymbol{\sigma}}σ~ will be discussed in Part V. For now we can simply note that it can be written as
(8.50) μ σ ~ = σ ν ; μ ω ν (8.50) μ σ ~ = σ ν ; μ ω ν {:(8.50)grad_(mu) tilde(sigma)=sigma_(nu;mu)omega^(nu):}\begin{equation*} \boldsymbol{\nabla}_{\mu} \tilde{\boldsymbol{\sigma}}=\sigma_{\nu ; \mu} \boldsymbol{\omega}^{\nu} \tag{8.50} \end{equation*}(8.50)μσ~=σν;μων
with
(8.51) σ ν ; μ = σ ν , μ Γ μ ν λ σ λ (8.51) σ ν ; μ = σ ν , μ Γ μ ν λ σ λ {:(8.51)sigma_(nu;mu)=sigma_(nu,mu)-Gamma_(mu nu)^(lambda)sigma_(lambda):}\begin{equation*} \sigma_{\nu ; \mu}=\sigma_{\nu, \mu}-\Gamma_{\mu \nu}^{\lambda} \sigma_{\lambda} \tag{8.51} \end{equation*}(8.51)σν;μ=σν,μΓμνλσλ
(a) Use this to compute a geodesic equation for a velocity 1 -form and hence for acceleration x ¨ μ x ¨ μ x^(¨)_(mu)\ddot{x}_{\mu}x¨μ.
(b) Use the result of part (a) to prove that the equation of motion in eqn 8.33 for massive particles has the property that f u = 0 f u = 0 f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0fu=0, for a force f f f\boldsymbol{f}f and instantaneous velocity u u u\boldsymbol{u}u.

Geodesic equations and connection coefficients

I know now that if I break my neck by falling off a cliff, my death is not to be blamed on the force of gravity (what does not exist is necessarily guiltless), but on the fact that I did not maintain the first curvature of my world-line, exchanging its security for a dangerous geodesic.
John Lighton Synge (1897-1995)
Synge's words, quoted above, remind us that our new geometric perspective on gravity motivates us to think about the curvature of our world line. 1 1 ^(1){ }^{1}1 In the last two chapters, we saw that how basis vectors change as we move through spacetime is reflected in the connection coefficients Γ μ α β Γ μ α β Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}Γμαβ which feature in the geodesic equation, which is the equation of motion for a particle in free fall. The connection coefficients are important, not only for the role they play in the geodesic equation, but because they tell us about the curvature of spacetime itself. In this chapter, we describe a method to extract the connection coefficients. As shown in Fig. 9.1, the idea is to input the metric and outputs the connection. The most important point of this chapter is the following. The metric field of spacetime, via the line element d s 2 = g μ ν d x μ d x ν d s 2 = g μ ν d x μ d x ν ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}ds2=gμνdxμdxν, generates the geodesics that freely falling particles follow and, therefore, the connection coefficients.

9.1 Finding connection coefficients

Let's formulate our method. The interval between spacetime points a a aaa and b b bbb can we written as the integral 2 2 ^(2){ }^{2}2
(9.3) s = a b d λ | g μ ν d x μ d λ d x ν d λ | 1 2 . (9.3) s = a b d λ g μ ν d x μ d λ d x ν d λ 1 2 . {:(9.3)s=int_(a)^(b)dlambda|g_(mu nu)(dx^(mu))/(dlambda)((d)x^(nu))/(dlambda)|^((1)/(2)).:}\begin{equation*} s=\int_{a}^{b} \mathrm{~d} \lambda\left|g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \lambda}\right|^{\frac{1}{2}} . \tag{9.3} \end{equation*}(9.3)s=ab dλ|gμνdxμdλ dxνdλ|12.
We use the Euler-Lagrange equations on the integrand to find the equations of motion. We saw in the last chapter that our expressions can be simplified by choosing length parametrization after the first set of derivatives have been taken and this is necessary to interpret s s sss as the interval between spacetime points. 3 3 ^(3){ }^{3}3 The equations of motion can be written in the form of the geodesic equation
(9.4) d 2 x μ d λ 2 + d x α d λ d x β d λ Γ μ α β = 0 . (9.4) d 2 x μ d λ 2 + d x α d λ d x β d λ Γ μ α β = 0 . {:(9.4)(d^(2)x^(mu))/(dlambda^(2))+(dx^(alpha))/(dlambda)((d)x^(beta))/(dlambda)Gamma^(mu)_(alpha beta)=0.:}\begin{equation*} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \lambda^{2}}+\frac{\mathrm{d} x^{\alpha}}{\mathrm{d} \lambda} \frac{\mathrm{~d} x^{\beta}}{\mathrm{d} \lambda} \Gamma^{\mu}{ }_{\alpha \beta}=0 . \tag{9.4} \end{equation*}(9.4)d2xμdλ2+dxαdλ dxβdλΓμαβ=0.
9.1 Finding connection coefficients 101
9.2 The geodesic equation from the action
1 A 1 A ^(1)A{ }^{1} \mathrm{~A}1 A world line which, we very much hope, will avoid any cliff falls.
g d s 2 Γ g d s 2 Γ g Longrightarrowds^(2)Longrightarrow Gamma\boldsymbol{g} \Longrightarrow \mathrm{d} s^{2} \Longrightarrow \Gammagds2Γ
Fig. 9.1 The metric generates the connection coefficients.
2 2 ^(2){ }^{2}2 For spacelike curves we have d s 2 > 0 d s 2 > 0 ds^(2) > 0\mathrm{d} s^{2}>0ds2>0, and we can write the interval as
Δ l = a b d λ ( g μ ν d x μ d λ d x ν d λ ) 1 2 Δ l = a b d λ g μ ν d x μ d λ d x ν d λ 1 2 Delta l=int_(a)^(b)dlambda(g_(mu nu)(dx^(mu))/(dlambda)*(dx^(nu))/(dlambda))^((1)/(2))\Delta l=\int_{a}^{b} \mathrm{~d} \lambda\left(g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \cdot \frac{\mathrm{d} x^{\nu}}{\mathrm{d} \lambda}\right)^{\frac{1}{2}}Δl=ab dλ(gμνdxμdλdxνdλ)12. .(9.1)
This is the proper length along the curve. Massive particles traverse timelike curves which have d s 2 < 0 d s 2 < 0 ds^(2) < 0\mathrm{d} s^{2}<0ds2<0. We can write the timelike interval
(9.2) Δ τ = a b d λ ( g μ ν d x μ d λ d x ν d λ ) 1 2 (9.2) Δ τ = a b d λ g μ ν d x μ d λ d x ν d λ 1 2 {:(9.2)Delta tau=int_(a)^(b)dlambda(-g_(mu nu)(dx^(mu))/(dlambda)((d)x^(nu))/(dlambda))^((1)/(2)):}\begin{equation*} \Delta \tau=\int_{a}^{b} \mathrm{~d} \lambda\left(-g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \lambda}\right)^{\frac{1}{2}} \tag{9.2} \end{equation*}(9.2)Δτ=ab dλ(gμνdxμdλ dxνdλ)12
This equation gives us the proper time that elapses for an observer travelling along the world line.
3 3 ^(3){ }^{3}3 We described the need for this in the previous chapter. For massive particles, which follow timelike geodesics, length parametrization, involving setting λ = τ λ = τ lambda=tau\lambda=\tauλ=τ, is also needed to ensure that the velocity vector u u u\boldsymbol{u}u, which is the tangent of the particle's world line, is constrained according to u u = 1 u u = 1 u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1uu=1.
\curvearrowright The rest of this chapter goes through in detail how to extract connection coefficients and calculate geodesics. A reader impatient to get on to the heart of general relativity can skip the rest of this chapter on first reading.
4 4 ^(4){ }^{4}4 These geometries reappear in severa of the exercises and examples later in the book.
Tips for these calculations:
  • To save on writing, in step I it's sometimes useful to write d x μ d λ d x μ d λ (dx^(mu))/(dlambda)\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda}dxμdλ as x ˙ μ x ˙ μ x^(˙)^(mu)\dot{x}^{\mu}x˙μ.
  • In step IV, we often employ the chain rule, noting that d d λ f ( x μ ) = d d λ f x μ = (d)/(dlambda)f(x^(mu))=\frac{\mathrm{d}}{\mathrm{d} \lambda} f\left(x^{\mu}\right)=ddλf(xμ)= f ( x μ ) x μ d x μ d λ f x μ x μ d x μ d λ (del f(x^(mu)))/(delx^(mu))(dx^(mu))/(dlambda)\frac{\partial f\left(x^{\mu}\right)}{\partial x^{\mu}} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda}f(xμ)xμdxμdλ.
  • In step V, equations of motion of the form x ¨ μ + 2 F x ˙ α x β = 0 x ¨ μ + 2 F x ˙ α x β = 0 x^(¨)^(mu)+2Fx^(˙)^(alpha)x^(beta)=0\ddot{x}^{\mu}+2 F \dot{x}^{\alpha} x^{\beta}=0x¨μ+2Fx˙αxβ=0 for α β α β alpha!=beta\alpha \neq \betaαβ yield Γ μ α β = F Γ μ α β = F Gamma^(mu)_(alpha beta)=F\Gamma^{\mu}{ }_{\alpha \beta}=FΓμαβ=F, owing to the summation Γ α β μ = F Γ α β μ = F Gamma_(alpha beta)^(mu)=F\Gamma_{\alpha \beta}^{\mu}=FΓαβμ=F, owing to the summatio
    convention in the geodesic equation. convention in the geodesic equation.
    ■ The length parametrization condi
    The length parametrization condi-
    tion L = 1 L = 1 L=1L=1L=1 is often a useful, additional tion L = 1 L = 1 L=1L=1L=1 is often a useful, additional
    constraint when solving the equations of motion.
By comparing the equations of motion we can simply read off the connection coefficients.
The method formalizes the method used in the examples in the previous chapter. It can be summarized as follows:
Step I: From the metric line element d s 2 d s 2 ds^(2)\mathrm{d} s^{2}ds2, write a parametrized expression for the spacetime interval s = L d λ s = L d λ s=int Ldlambdas=\int L \mathrm{~d} \lambdas=L dλ, where λ λ lambda\lambdaλ is the parameter. Step II: Calculate L ( d x μ d λ ) L d x μ d λ (del L)/(del(((d)x^(mu))/(dlambda)))\frac{\partial L}{\partial\left(\frac{\mathrm{~d} x^{\mu}}{\mathrm{d} \lambda}\right)}L( dxμdλ) and L x μ L x μ (del L)/(delx^(mu))\frac{\partial L}{\partial x^{\mu}}Lxμ.
Step III: Choose length parametrization such that L = 1 L = 1 L=1L=1L=1.
Step IV: Calculate d d λ L ( d x μ d λ ) d d λ L d x μ d λ (d)/(dlambda)(del L)/(del(((d)x^(mu))/(dlambda)))\frac{\mathrm{d}}{\mathrm{d} \lambda} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} x^{\mu}}{\mathrm{d} \lambda}\right)}ddλL( dxμdλ) and insert the values into the E-L equations.
Step V: Read off the connection coefficients, remembering that Γ α β μ = Γ β α μ Γ α β μ = Γ β α μ Gamma_(alpha beta)^(mu)=Gamma_(beta alpha)^(mu)\Gamma_{\alpha \beta}^{\mu}=\Gamma_{\beta \alpha}^{\mu}Γαβμ=Γβαμ.
We shall work through a number of examples demonstrating how to extract connection coefficients. In each case, we start with a metric and end with connection coefficients. 4 4 ^(4){ }^{4}4
Example 9.1
Let's try the two-dimensional space on the surface of a unit sphere. The interval (step I) is
s = ( d θ 2 + sin 2 θ d ϕ 2 ) 1 2 (9.5) = d λ [ ( d θ d λ ) 2 + sin 2 θ ( d ϕ d λ ) 2 ] 1 2 s = d θ 2 + sin 2 θ d ϕ 2 1 2 (9.5) = d λ d θ d λ 2 + sin 2 θ d ϕ d λ 2 1 2 {:[s=int(dtheta^(2)+sin^(2)theta(d)phi^(2))^((1)/(2))],[(9.5)=intdlambda[(((d)theta)/((d)lambda))^(2)+sin^(2)theta(((d)phi)/((d)lambda))^(2)]^((1)/(2))]:}\begin{align*} s & =\int\left(\mathrm{d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right)^{\frac{1}{2}} \\ & =\int \mathrm{d} \lambda\left[\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}+\sin ^{2} \theta\left(\frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{9.5} \end{align*}s=(dθ2+sin2θ dϕ2)12(9.5)=dλ[( dθ dλ)2+sin2θ( dϕ dλ)2]12
Therefore, the metric has components g θ θ = 1 g θ θ = 1 g_(theta theta)=1g_{\theta \theta}=1gθθ=1 and g ϕ ϕ = sin 2 θ g ϕ ϕ = sin 2 θ g_(phi phi)=sin^(2)thetag_{\phi \phi}=\sin ^{2} \thetagϕϕ=sin2θ. We saw [in Exercise 8.1 from the last chapter] that (following steps II-IV) the equations of motion are
(9.6) d 2 θ d λ 2 sin θ cos θ ( d ϕ d λ ) 2 = 0 d 2 ϕ d λ 2 + 2 cot θ d ϕ d λ d θ d λ = 0 (9.6) d 2 θ d λ 2 sin θ cos θ d ϕ d λ 2 = 0 d 2 ϕ d λ 2 + 2 cot θ d ϕ d λ d θ d λ = 0 {:(9.6){:[(d^(2)theta)/((d)lambda^(2))-sin theta cos theta((dphi)/((d)lambda))^(2)=0],[(d^(2)phi)/((d)lambda^(2))+2cot theta((d)phi)/((d)lambda)*((d)theta)/((d)lambda)=0]:}:}\begin{array}{r} \frac{\mathrm{d}^{2} \theta}{\mathrm{~d} \lambda^{2}}-\sin \theta \cos \theta\left(\frac{\mathrm{d} \phi}{\mathrm{~d} \lambda}\right)^{2}=0 \\ \frac{\mathrm{~d}^{2} \phi}{\mathrm{~d} \lambda^{2}}+2 \cot \theta \frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda} \cdot \frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}=0 \tag{9.6} \end{array}(9.6)d2θ dλ2sinθcosθ(dϕ dλ)2=0 d2ϕ dλ2+2cotθ dϕ dλ dθ dλ=0
and we read off the connection coefficients (step V)
(9.7) Γ ϕ ϕ θ = sin θ cos θ , Γ θ ϕ ϕ = cot θ . (9.7) Γ ϕ ϕ θ = sin θ cos θ , Γ θ ϕ ϕ = cot θ . {:(9.7)Gamma_(phi phi)^(theta)=-sin theta cos theta","quadGamma_(theta phi)^(phi)=cot theta.:}\begin{equation*} \Gamma_{\phi \phi}^{\theta}=-\sin \theta \cos \theta, \quad \Gamma_{\theta \phi}^{\phi}=\cot \theta . \tag{9.7} \end{equation*}(9.7)Γϕϕθ=sinθcosθ,Γθϕϕ=cotθ.
We can examine different curved surfaces, such as the parabolic space of the next example.

Example 9.2

A parabolic surface with line element d s 2 = ( 1 + a 2 r 2 ) d r 2 + r 2 d θ 2 d s 2 = 1 + a 2 r 2 d r 2 + r 2 d θ 2 ds^(2)=(1+a^(2)r^(2))dr^(2)+r^(2)dtheta^(2)\mathrm{d} s^{2}=\left(1+a^{2} r^{2}\right) \mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}ds2=(1+a2r2)dr2+r2 dθ2 has interval (step I)
(9.8) s = d λ [ ( 1 + a 2 r 2 ) ( d r d λ ) 2 + r 2 ( d θ d λ ) 2 ] 1 2 (9.8) s = d λ 1 + a 2 r 2 d r d λ 2 + r 2 d θ d λ 2 1 2 {:(9.8)s=intdlambda[(1+a^(2)r^(2))((dr)/((d)lambda))^(2)+r^(2)(((d)theta)/((d)lambda))^(2)]^((1)/(2)):}\begin{equation*} s=\int \mathrm{d} \lambda\left[\left(1+a^{2} r^{2}\right)\left(\frac{\mathrm{d} r}{\mathrm{~d} \lambda}\right)^{2}+r^{2}\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{9.8} \end{equation*}(9.8)s=dλ[(1+a2r2)(dr dλ)2+r2( dθ dλ)2]12
Step II of the method gives us
(9.9) L ( d r d λ ) = 1 L ( 1 + a 2 r 2 ) d r d λ , L ( d θ d λ ) = 1 L r 2 d θ d λ , L r = 1 L [ a 2 r d r d λ + r ( d θ d λ ) 2 ] , L θ = 0 . (9.9) L d r d λ = 1 L 1 + a 2 r 2 d r d λ , L d θ d λ = 1 L r 2 d θ d λ , L r = 1 L a 2 r d r d λ + r d θ d λ 2 , L θ = 0 . {:(9.9){:[(del L)/(del(((d)r)/((d)lambda)))=(1)/(L)(1+a^(2)r^(2))(dr)/((d)lambda)",",(del L)/(del(((d)theta)/((d)lambda)))=(1)/(L)r^(2)((d)theta)/((d)lambda)","],[(del L)/(del r)=(1)/(L)[a^(2)r((d)r)/((d)lambda)+r(((d)theta)/((d)lambda))^(2)]",",(del L)/(del theta)=0.]:}:}\begin{array}{cc} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)}=\frac{1}{L}\left(1+a^{2} r^{2}\right) \frac{\mathrm{d} r}{\mathrm{~d} \lambda}, & \frac{\partial L}{\partial\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)}=\frac{1}{L} r^{2} \frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}, \\ \frac{\partial L}{\partial r}=\frac{1}{L}\left[a^{2} r \frac{\mathrm{~d} r}{\mathrm{~d} \lambda}+r\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}\right], & \frac{\partial L}{\partial \theta}=0 . \tag{9.9} \end{array}(9.9)L( dr dλ)=1L(1+a2r2)dr dλ,L( dθ dλ)=1Lr2 dθ dλ,Lr=1L[a2r dr dλ+r( dθ dλ)2],Lθ=0.
Now use length parametrization (step III) and find the equations of motion (step IV). Here's the first
(9.10) d 2 r d λ 2 + a 2 r 1 + a 2 r 2 ( d r d λ ) 2 r 1 + a 2 r 2 ( d θ d λ ) 2 = 0 (9.10) d 2 r d λ 2 + a 2 r 1 + a 2 r 2 d r d λ 2 r 1 + a 2 r 2 d θ d λ 2 = 0 {:(9.10)(d^(2)r)/((d)lambda^(2))+(a^(2)r)/(1+a^(2)r^(2))(((d)r)/((d)lambda))^(2)-(r)/(1+a^(2)r^(2))(((d)theta)/((d)lambda))^(2)=0:}\begin{equation*} \frac{\mathrm{d}^{2} r}{\mathrm{~d} \lambda^{2}}+\frac{a^{2} r}{1+a^{2} r^{2}}\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)^{2}-\frac{r}{1+a^{2} r^{2}}\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}=0 \tag{9.10} \end{equation*}(9.10)d2r dλ2+a2r1+a2r2( dr dλ)2r1+a2r2( dθ dλ)2=0
And the second
(9.11) d 2 θ d λ 2 + 2 r d r d λ d θ d λ = 0 (9.11) d 2 θ d λ 2 + 2 r d r d λ d θ d λ = 0 {:(9.11)(d^(2)theta)/((d)lambda^(2))+(2)/(r)((d)r)/((d)lambda)((d)theta)/((d)lambda)=0:}\begin{equation*} \frac{\mathrm{d}^{2} \theta}{\mathrm{~d} \lambda^{2}}+\frac{2}{r} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}=0 \tag{9.11} \end{equation*}(9.11)d2θ dλ2+2r dr dλ dθ dλ=0
We read off the connection coefficients (step V)
(9.12) Γ r θ θ = 1 r , Γ r r r = a 2 r 1 + a 2 r 2 , Γ θ θ r = r 1 + a 2 r 2 (9.12) Γ r θ θ = 1 r , Γ r r r = a 2 r 1 + a 2 r 2 , Γ θ θ r = r 1 + a 2 r 2 {:(9.12)Gamma_(r theta)^(theta)=(1)/(r)","quadGamma_(rr)^(r)=(a^(2)r)/(1+a^(2)r^(2))","quadGamma_(theta theta)^(r)=-(r)/(1+a^(2)r^(2)):}\begin{equation*} \Gamma_{r \theta}^{\theta}=\frac{1}{r}, \quad \Gamma_{r r}^{r}=\frac{a^{2} r}{1+a^{2} r^{2}}, \quad \Gamma_{\theta \theta}^{r}=-\frac{r}{1+a^{2} r^{2}} \tag{9.12} \end{equation*}(9.12)Γrθθ=1r,Γrrr=a2r1+a2r2,Γθθr=r1+a2r2
As expected, these reduce down to the flat-plane connection coefficients in the case that a = 0 a = 0 a=0a=0a=0.
We can examine more exotic spaces still, such as the interesting Poincaré half plane. 5 5 ^(5){ }^{5}5
5 5 ^(5){ }^{5}5 Henri Poincaré (1854-1912). The Poincaré half plane provides a model of hyperbolic geometry. See Exercise 9.8 for an introduction to the Poincaré half plane and Chapters 16 and 19 for more discussion of hyperbolic spaces.
The Poincaré half plane has a metric
which is defined for r > 0 r > 0 r > 0r>0r>0. The interval (step I) is
(9.14) s = d λ [ 1 r 2 ( d r d λ ) 2 + 1 r 2 ( d x d λ ) 2 ] 1 2 (9.14) s = d λ 1 r 2 d r d λ 2 + 1 r 2 d x d λ 2 1 2 {:(9.14)s=intdlambda[(1)/(r^(2))*(((d)r)/((d)lambda))^(2)+(1)/(r^(2))*(((d)x)/((d)lambda))^(2)]^((1)/(2)):}\begin{equation*} s=\int \mathrm{d} \lambda\left[\frac{1}{r^{2}} \cdot\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)^{2}+\frac{1}{r^{2}} \cdot\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{9.14} \end{equation*}(9.14)s=dλ[1r2( dr dλ)2+1r2( dx dλ)2]12
Put this through the standard machine (step II)
L ( d r d ) = 1 L 1 r 2 d r d λ , L d x = 1 d λ 1 r 2 d x d λ , L r = 1 L 1 r 3 [ ( d r d λ ) 2 + ( d x d λ ) 2 ] , L x = 0 . L d r d = 1 L 1 r 2 d r d λ , L d x = 1 d λ 1 r 2 d x d λ , L r = 1 L 1 r 3 d r d λ 2 + d x d λ 2 , L x = 0 . {:[(del L)/(del(((d)r)/((d))))=(1)/(L)(1)/(r^(2))((d)r)/((d)lambda)",",(del L)/(del(d)x)=(1)/((d)lambda)(1)/(r^(2))((d)x)/((d)lambda)","],[(del L)/(del r)=-(1)/(L)*(1)/(r^(3))[(((d)r)/((d)lambda))^(2)+((dx)/((d)lambda))^(2)]",",(del L)/(del x)=0.]:}\begin{array}{cc} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} r}{\mathrm{~d}}\right)}=\frac{1}{L} \frac{1}{r^{2}} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda}, & \frac{\partial L}{\partial \mathrm{~d} x}=\frac{1}{\mathrm{~d} \lambda} \frac{1}{r^{2}} \frac{\mathrm{~d} x}{\mathrm{~d} \mathrm{\lambda}}, \\ \frac{\partial L}{\partial r}=-\frac{1}{L} \cdot \frac{1}{r^{3}}\left[\left(\frac{\mathrm{~d} r}{\mathrm{~d} \mathrm{\lambda}}\right)^{2}+\left(\frac{\mathrm{d} x}{\mathrm{~d} \lambda}\right)^{2}\right], & \frac{\partial L}{\partial x}=0 . \end{array}L( dr d)=1L1r2 dr dλ,L dx=1 dλ1r2 dx dλ,Lr=1L1r3[( dr dλ)2+(dx dλ)2],Lx=0.
Now use length parametrization (step III) so that we make L = 1 L = 1 L=1L=1L=1. We then find
d d λ ( 1 r 2 d r d λ ) = 1 r 2 d 2 r d λ 2 2 r 3 d r d λ d r d λ (9.15) d d λ ( 1 r 2 d x d λ ) = 1 r 2 d 2 x d λ 2 2 r 3 d r d λ d x d λ d d λ 1 r 2 d r d λ = 1 r 2 d 2 r d λ 2 2 r 3 d r d λ d r d λ (9.15) d d λ 1 r 2 d x d λ = 1 r 2 d 2 x d λ 2 2 r 3 d r d λ d x d λ {:[(d)/((d)lambda)((1)/(r^(2))((d)r)/((d)lambda))=(1)/(r^(2))(d^(2)r)/((d)lambda^(2))-(2)/(r^(3))((d)r)/((d)lambda)*((d)r)/((d)lambda)],[(9.15)((d))/((d)lambda)((1)/(r^(2))((d)x)/((d)lambda))=(1)/(r^(2))(d^(2)x)/((d)lambda^(2))-(2)/(r^(3))((d)r)/((d)lambda)*((d)x)/((d)lambda)]:}\begin{align*} & \frac{\mathrm{d}}{\mathrm{~d} \lambda}\left(\frac{1}{r^{2}} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)=\frac{1}{r^{2}} \frac{\mathrm{~d}^{2} r}{\mathrm{~d} \lambda^{2}}-\frac{2}{r^{3}} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \cdot \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \\ & \frac{\mathrm{~d}}{\mathrm{~d} \lambda}\left(\frac{1}{r^{2}} \frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)=\frac{1}{r^{2}} \frac{\mathrm{~d}^{2} x}{\mathrm{~d} \lambda^{2}}-\frac{2}{r^{3}} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \cdot \frac{\mathrm{~d} x}{\mathrm{~d} \lambda} \tag{9.15} \end{align*}d dλ(1r2 dr dλ)=1r2 d2r dλ22r3 dr dλ dr dλ(9.15) d dλ(1r2 dx dλ)=1r2 d2x dλ22r3 dr dλ dx dλ
We obtain equations of motion (step IV)
d 2 r d λ 2 2 r d r d λ d r d λ = 1 r [ ( d r d λ ) 2 + ( d x d λ ) 2 ] d 2 r d λ 2 2 r d r d λ d r d λ = 1 r d r d λ 2 + d x d λ 2 (d^(2)r)/((d)lambda^(2))-(2)/(r)((d)r)/((d)lambda)((d)r)/((d)lambda)=-(1)/(r)[(((d)r)/((d)lambda))^(2)+((dx)/((d)lambda))^(2)]\frac{\mathrm{d}^{2} r}{\mathrm{~d} \lambda^{2}}-\frac{2}{r} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda}=-\frac{1}{r}\left[\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)^{2}+\left(\frac{\mathrm{d} x}{\mathrm{~d} \lambda}\right)^{2}\right]d2r dλ22r dr dλ dr dλ=1r[( dr dλ)2+(dx dλ)2]
(9.16) d 2 x d λ 2 2 r d r d λ d x d λ = 0 (9.16) d 2 x d λ 2 2 r d r d λ d x d λ = 0 {:(9.16)(d^(2)x)/((d)lambda^(2))-(2)/(r)((d)r)/((d)lambda)((d)x)/((d)lambda)=0:}\begin{equation*} \frac{\mathrm{d}^{2} x}{\mathrm{~d} \lambda^{2}}-\frac{2}{r} \frac{\mathrm{~d} r}{\mathrm{~d} \lambda} \frac{\mathrm{~d} x}{\mathrm{~d} \lambda}=0 \tag{9.16} \end{equation*}(9.16)d2x dλ22r dr dλ dx dλ=0
From the first of these we obtain
(9.17) d 2 r d λ 2 1 r ( d r d λ ) 2 + 1 r ( d x d λ ) 2 = 0 (9.17) d 2 r d λ 2 1 r d r d λ 2 + 1 r d x d λ 2 = 0 {:(9.17)(d^(2)r)/((d)lambda^(2))-(1)/(r)(((d)r)/((d)lambda))^(2)+(1)/(r)(((d)x)/((d)lambda))^(2)=0:}\begin{equation*} \frac{\mathrm{d}^{2} r}{\mathrm{~d} \lambda^{2}}-\frac{1}{r}\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)^{2}+\frac{1}{r}\left(\frac{\mathrm{~d} x}{\mathrm{~d} \lambda}\right)^{2}=0 \tag{9.17} \end{equation*}(9.17)d2r dλ21r( dr dλ)2+1r( dx dλ)2=0
We read off connection coefficients (step V)
(9.18) Γ x x r = Γ x r x = 1 r , Γ r r r = 1 r , Γ r x x = 1 r . (9.18) Γ x x r = Γ x r x = 1 r , Γ r r r = 1 r , Γ r x x = 1 r . {:(9.18)Gamma^(x)_(xr)=Gamma^(x)_(rx)=-(1)/(r)","quadGamma_(rr)^(r)=-(1)/(r)","quadGamma^(r)_(xx)=(1)/(r).:}\begin{equation*} \Gamma^{x}{ }_{x r}=\Gamma^{x}{ }_{r x}=-\frac{1}{r}, \quad \Gamma_{r r}^{r}=-\frac{1}{r}, \quad \Gamma^{r}{ }_{x x}=\frac{1}{r} . \tag{9.18} \end{equation*}(9.18)Γxxr=Γxrx=1r,Γrrr=1r,Γrxx=1r.
The geodesics themselves turn out to be circular arcs and are examined in Exercise 9.8.

9.2 The geodesic equation from the action

So far we have looked at a selection of special cases, evaluating spacelike
6 6 ^(6){ }^{6}6 Lots more examples can be found in the exercises.
7 7 ^(7){ }^{7}7 Although the resulting expression is useful, it is often quicker in practice to use the five-point method to extract the coefficients directly from the action. intervals for space-only metrics with a ( +++ ) signature. 6 6 ^(6){ }^{6}6 However, using the Euler-Lagrange equations, we should be able to derive the equation of motion for a massive particle in a general spacetime, once and for all, from the action
(9.19) S = m d τ ( g μ ν d x μ d τ d x ν d τ ) 1 2 (9.19) S = m d τ g μ ν d x μ d τ d x ν d τ 1 2 {:(9.19)S=-m intdtau(-g_(mu nu)(dx^(mu))/(dtau)*((d)x^(nu))/(dtau))^((1)/(2)):}\begin{equation*} S=-m \int \mathrm{~d} \tau\left(-g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \cdot \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \tau}\right)^{\frac{1}{2}} \tag{9.19} \end{equation*}(9.19)S=m dτ(gμνdxμdτ dxνdτ)12
This action is proportional to the proper time interval along a world line parametrized by the proper time τ τ tau\tauτ. We can therefore extremize the action using the same procedure as before. We already know what the answer must be: the geodesic equation from the previous chapter. However, this procedure will also provide a useful and simple formula for extracting the connection coefficients directly from the metric. 7 7 ^(7){ }^{7}7
Example 9.4
We identify (step I)
(9.20) L = m ( g μ ν d x μ d τ d x ν d τ ) 1 2 (9.20) L = m g μ ν d x μ d τ d x ν d τ 1 2 {:(9.20)L=-m(-g_(mu nu)(dx^(mu))/(dtau)((d)x^(nu))/(dtau))^((1)/(2)):}\begin{equation*} L=-m\left(-g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \frac{\mathrm{~d} x^{\nu}}{\mathrm{d} \tau}\right)^{\frac{1}{2}} \tag{9.20} \end{equation*}(9.20)L=m(gμνdxμdτ dxνdτ)12
Step II yields
(9.21) L x ˙ μ = m g μ ν d x ν d τ 1 L (9.21) L x ˙ μ = m g μ ν d x ν d τ 1 L {:(9.21)(del L)/(delx^(˙)^(mu))=mg_(mu nu)(dx^(nu))/(dtau)*(1)/(L):}\begin{equation*} \frac{\partial L}{\partial \dot{x}^{\mu}}=m g_{\mu \nu} \frac{\mathrm{d} x^{\nu}}{\mathrm{d} \tau} \cdot \frac{1}{L} \tag{9.21} \end{equation*}(9.21)Lx˙μ=mgμνdxνdτ1L
along with a force term
(9.22) L x ν = m 2 g μ σ x ν d x μ d τ d x σ d τ 1 L (9.22) L x ν = m 2 g μ σ x ν d x μ d τ d x σ d τ 1 L {:(9.22)(del L)/(delx^(nu))=(m)/(2)*(delg_(mu sigma))/(delx^(nu))(dx^(mu))/(dtau)*((d)x^(sigma))/(dtau)*(1)/(L):}\begin{equation*} \frac{\partial L}{\partial x^{\nu}}=\frac{m}{2} \cdot \frac{\partial g_{\mu \sigma}}{\partial x^{\nu}} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \cdot \frac{\mathrm{~d} x^{\sigma}}{\mathrm{d} \tau} \cdot \frac{1}{L} \tag{9.22} \end{equation*}(9.22)Lxν=m2gμσxνdxμdτ dxσdτ1L
Choose the parametrization such that L = 1 L = 1 L=1L=1L=1 (Step III) and we find, at step IV, that
(9.23) d d τ L x μ = m d d τ ( g μ ν d x μ d τ ) . (9.23) d d τ L x μ = m d d τ g μ ν d x μ d τ . {:(9.23)(d)/((d)tau)(del L)/(delx^(mu))=m((d))/((d)tau)(g_(mu nu)(dx^(mu))/(dtau)).:}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \tau} \frac{\partial L}{\partial x^{\mu}}=m \frac{\mathrm{~d}}{\mathrm{~d} \tau}\left(g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau}\right) . \tag{9.23} \end{equation*}(9.23)d dτLxμ=m d dτ(gμνdxμdτ).
The Euler-Lagrange equation then reads
(9.24) d d τ ( g μ ν d x μ d τ ) = 1 2 g μ σ x ν d x μ d τ d x σ d τ (9.24) d d τ g μ ν d x μ d τ = 1 2 g μ σ x ν d x μ d τ d x σ d τ {:(9.24)(d)/((d)tau)(g_(mu nu)(dx^(mu))/(dtau))=(1)/(2)*(delg_(mu sigma))/(delx^(nu))(dx^(mu))/(dtau)*((d)x^(sigma))/(dtau):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \tau}\left(g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau}\right)=\frac{1}{2} \cdot \frac{\partial g_{\mu \sigma}}{\partial x^{\nu}} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \cdot \frac{\mathrm{~d} x^{\sigma}}{\mathrm{d} \tau} \tag{9.24} \end{equation*}(9.24)d dτ(gμνdxμdτ)=12gμσxνdxμdτ dxσdτ
We can evaluate the left-hand side to find
(9.25) d d τ ( g μ ν d x μ d τ ) = g μ ν d 2 x μ d τ 2 + g μ ν x σ d x σ d τ d x μ d τ (9.25) d d τ g μ ν d x μ d τ = g μ ν d 2 x μ d τ 2 + g μ ν x σ d x σ d τ d x μ d τ {:(9.25)(d)/((d)tau)(g_(mu nu)(dx^(mu))/(dtau))=g_(mu nu)(d^(2)x^(mu))/(dtau^(2))+(delg_(mu nu))/(delx^(sigma))*(dx^(sigma))/(dtau)((d)x^(mu))/(dtau):}\begin{equation*} \frac{\mathrm{d}}{\mathrm{~d} \tau}\left(g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau}\right)=g_{\mu \nu} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}+\frac{\partial g_{\mu \nu}}{\partial x^{\sigma}} \cdot \frac{\mathrm{d} x^{\sigma}}{\mathrm{d} \tau} \frac{\mathrm{~d} x^{\mu}}{\mathrm{d} \tau} \tag{9.25} \end{equation*}(9.25)d dτ(gμνdxμdτ)=gμνd2xμdτ2+gμνxσdxσdτ dxμdτ
The Euler-Lagrange equation becomes
(9.26) g μ ν d 2 x μ d τ 2 + ( g μ ν x σ 1 2 g μ σ x ν ) d x μ d τ d x σ d τ = 0 (9.26) g μ ν d 2 x μ d τ 2 + g μ ν x σ 1 2 g μ σ x ν d x μ d τ d x σ d τ = 0 {:(9.26)g_(mu nu)(d^(2)x^(mu))/(dtau^(2))+((delg_(mu nu))/(delx^(sigma))-(1)/(2)(delg_(mu sigma))/(delx^(nu)))(dx^(mu))/(dtau)((d)x^(sigma))/(dtau)=0:}\begin{equation*} g_{\mu \nu} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}+\left(\frac{\partial g_{\mu \nu}}{\partial x^{\sigma}}-\frac{1}{2} \frac{\partial g_{\mu \sigma}}{\partial x^{\nu}}\right) \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \frac{\mathrm{~d} x^{\sigma}}{\mathrm{d} \tau}=0 \tag{9.26} \end{equation*}(9.26)gμνd2xμdτ2+(gμνxσ12gμσxν)dxμdτ dxσdτ=0
We now tidy up by noting that, since the metric components are symmetric,
(9.27) x ˙ μ x ˙ σ g μ ν , σ = x ˙ μ x ˙ σ 2 ( g μ ν , σ + g σ ν , μ ) , (9.27) x ˙ μ x ˙ σ g μ ν , σ = x ˙ μ x ˙ σ 2 g μ ν , σ + g σ ν , μ , {:(9.27)x^(˙)^(mu)x^(˙)^(sigma)g_(mu nu,sigma)=(x^(˙)^(mu)x^(˙)^(sigma))/(2)*(g_(mu nu,sigma)+g_(sigma nu,mu))",":}\begin{equation*} \dot{x}^{\mu} \dot{x}^{\sigma} g_{\mu \nu, \sigma}=\frac{\dot{x}^{\mu} \dot{x}^{\sigma}}{2} \cdot\left(g_{\mu \nu, \sigma}+g_{\sigma \nu, \mu}\right), \tag{9.27} \end{equation*}(9.27)x˙μx˙σgμν,σ=x˙μx˙σ2(gμν,σ+gσν,μ),
which allows us to conclude (step V):
(9.28) g μ ν d 2 x μ d τ 2 + 1 2 ( g μ ν x σ + g σ ν x ν g μ σ x ν ) d x μ d τ d x σ d τ = 0 (9.28) g μ ν d 2 x μ d τ 2 + 1 2 g μ ν x σ + g σ ν x ν g μ σ x ν d x μ d τ d x σ d τ = 0 {:(9.28)g_(mu nu)(d^(2)x^(mu))/(dtau^(2))+(1)/(2)((delg_(mu nu))/(delx^(sigma))+(delg_(sigma nu))/(delx^(nu))-(delg_(mu sigma))/(delx^(nu)))(dx^(mu))/(dtau)((d)x^(sigma))/(dtau)=0:}\begin{equation*} g_{\mu \nu} \frac{\mathrm{d}^{2} x^{\mu}}{\mathrm{d} \tau^{2}}+\frac{1}{2}\left(\frac{\partial g_{\mu \nu}}{\partial x^{\sigma}}+\frac{\partial g_{\sigma \nu}}{\partial x^{\nu}}-\frac{\partial g_{\mu \sigma}}{\partial x^{\nu}}\right) \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \frac{\mathrm{~d} x^{\sigma}}{\mathrm{d} \tau}=0 \tag{9.28} \end{equation*}(9.28)gμνd2xμdτ2+12(gμνxσ+gσνxνgμσxν)dxμdτ dxσdτ=0
We define the all-down-index connection coefficients Γ λ μ σ = g ρ λ Γ ρ μ σ Γ λ μ σ = g ρ λ Γ ρ μ σ Gamma_(lambda mu sigma)=g_(rho lambda)Gamma^(rho)_(mu sigma)\Gamma_{\lambda \mu \sigma}=g_{\rho \lambda} \Gamma^{\rho}{ }_{\mu \sigma}Γλμσ=gρλΓρμσ, and then we have the following. 8 8 ^(8){ }^{8}8
(9.29) Γ λ μ σ = 1 2 ( g λ μ x σ + g λ σ x μ g μ σ x λ ) . (9.29) Γ λ μ σ = 1 2 g λ μ x σ + g λ σ x μ g μ σ x λ . {:(9.29)Gamma_(lambda mu sigma)=(1)/(2)((delg_(lambda mu))/(delx^(sigma))+(delg_(lambda sigma))/(delx^(mu))-(delg_(mu sigma))/(delx^(lambda))).:}\begin{equation*} \Gamma_{\lambda \mu \sigma}=\frac{1}{2}\left(\frac{\partial g_{\lambda \mu}}{\partial x^{\sigma}}+\frac{\partial g_{\lambda \sigma}}{\partial x^{\mu}}-\frac{\partial g_{\mu \sigma}}{\partial x^{\lambda}}\right) . \tag{9.29} \end{equation*}(9.29)Γλμσ=12(gλμxσ+gλσxμgμσxλ).
The conclusion of this lengthy exercise 9 9 ^(9){ }^{9}9 is that, given only the metric, we can work out connection coefficients and have access to the equation of motion of the freely falling particle. Note, however, that since the geodesic equation applies beyond the timelike geodesics followed by massive particles, eqn 9.29 is a general, geometrical expression linking the metric with the connection coefficients.

Example 9.5

Consider the metric line element
(9.33) d s 2 = 1 t 2 ( d x 2 d t 2 ) (9.33) d s 2 = 1 t 2 d x 2 d t 2 {:(9.33)ds^(2)=(1)/(t^(2))((d)x^(2)-dt^(2)):}\begin{equation*} \mathrm{d} s^{2}=\frac{1}{t^{2}}\left(\mathrm{~d} x^{2}-\mathrm{d} t^{2}\right) \tag{9.33} \end{equation*}(9.33)ds2=1t2( dx2dt2)
This corresponds to a metric with components g t t = 1 / t 2 g t t = 1 / t 2 g_(tt)=-1//t^(2)g_{t t}=-1 / t^{2}gtt=1/t2 and g x x = 1 / t 2 g x x = 1 / t 2 g_(xx)=1//t^(2)g_{x x}=1 / t^{2}gxx=1/t2. The connection coefficients can be calculated using eqn 9.28 , to yield
Γ t t t = 1 t 3 , Γ x x t = Γ x t x = Γ t x x = 1 t 3 . Γ t t t = 1 t 3 , Γ x x t = Γ x t x = Γ t x x = 1 t 3 . Gamma_(ttt)=(1)/(t^(3)),quadGamma_(xxt)=Gamma_(xtx)=-Gamma_(txx)=-(1)/(t^(3)).\Gamma_{t t t}=\frac{1}{t^{3}}, \quad \Gamma_{x x t}=\Gamma_{x t x}=-\Gamma_{t x x}=-\frac{1}{t^{3}} .Γttt=1t3,Γxxt=Γxtx=Γtxx=1t3.
Using g t t = t 2 g t t = t 2 g^(tt)=-t^(2)g^{t t}=-t^{2}gtt=t2 and g x x = t 2 g x x = t 2 g^(xx)=t^(2)g^{x x}=t^{2}gxx=t2, we have Γ t t t = Γ t x x = Γ x x t = 1 / t Γ t t t = Γ t x x = Γ x x t = 1 / t Gamma^(t)_(tt)=Gamma^(t)_(xx)=Gamma^(x)_(xt)=-1//t\Gamma^{t}{ }_{t t}=\Gamma^{t}{ }_{x x}=\Gamma^{x}{ }_{x t}=-1 / tΓttt=Γtxx=Γxxt=1/t.
8 8 ^(8){ }^{8}8 In comma notation
(9.29a) Γ λ μ σ = 1 2 ( g λ μ , σ + g λ σ , μ g μ σ , λ ) (9.29a) Γ λ μ σ = 1 2 g λ μ , σ + g λ σ , μ g μ σ , λ {:(9.29a)Gamma_(lambda mu sigma)=(1)/(2)(g_(lambda mu,sigma)+g_(lambda sigma,mu)-g_(mu sigma,lambda)):}\begin{equation*} \Gamma_{\lambda \mu \sigma}=\frac{1}{2}\left(g_{\lambda \mu, \sigma}+g_{\lambda \sigma, \mu}-g_{\mu \sigma, \lambda}\right) \tag{9.29a} \end{equation*}(9.29a)Γλμσ=12(gλμ,σ+gλσ,μgμσ,λ)
9 9 ^(9){ }^{9}9 There are some other useful expressions that can be employed in calculations. One useful identity is
(9.30) g α β , γ = Γ α β γ + Γ β α γ . (9.30) g α β , γ = Γ α β γ + Γ β α γ . {:(9.30)g_(alpha beta,gamma)=Gamma_(alpha beta gamma)+Gamma_(beta alpha gamma).:}\begin{equation*} g_{\alpha \beta, \gamma}=\Gamma_{\alpha \beta \gamma}+\Gamma_{\beta \alpha \gamma} . \tag{9.30} \end{equation*}(9.30)gαβ,γ=Γαβγ+Γβαγ.
Another is
Γ β α α = x β ln g Γ β α α = x β ln g Gamma_(beta alpha)^(alpha)=(del)/(delx^(beta))*ln sqrt(-g)\Gamma_{\beta \alpha}^{\alpha}=\frac{\partial}{\partial x^{\beta}} \cdot \ln \sqrt{-g}Γβαα=xβlng
where g g ggg is the determinant of the metric tensor. For a diagonal metric the following hold
Γ ν λ μ = 0 Γ ν λ μ = 0 Gamma_(nu lambda)^(mu)=0\Gamma_{\nu \lambda}^{\mu}=0Γνλμ=0
Γ λ λ μ = 1 2 g μ μ g λ λ x μ , Γ λ λ μ = 1 2 g μ μ g λ λ x μ , Gamma_(lambda lambda)^(mu)=-(1)/(2g_(mu mu))(delg_(lambda lambda))/(delx^(mu)),\Gamma_{\lambda \lambda}^{\mu}=-\frac{1}{2 g_{\mu \mu}} \frac{\partial g_{\lambda \lambda}}{\partial x^{\mu}},Γλλμ=12gμμgλλxμ,
Γ μ λ μ = x λ ( ln | g μ μ | 1 2 ) , Γ μ λ μ = x λ ln g μ μ 1 2 , Gamma_(mu lambda)^(mu)=(del)/(delx^(lambda))(ln |g_(mu mu)|^((1)/(2))),quad\Gamma_{\mu \lambda}^{\mu}=\frac{\partial}{\partial x^{\lambda}}\left(\ln \left|g_{\mu \mu}\right|^{\frac{1}{2}}\right), \quadΓμλμ=xλ(ln|gμμ|12), (9.32)
Γ μ μ μ = x μ ( ln | g μ μ | 1 2 ) Γ μ μ μ = x μ ln g μ μ 1 2 Gamma^(mu)_(mu mu)=(del)/(delx^(mu))(ln |g_(mu mu)|^((1)/(2)))\Gamma^{\mu}{ }_{\mu \mu}=\frac{\partial}{\partial x^{\mu}}\left(\ln \left|g_{\mu \mu}\right|^{\frac{1}{2}}\right)Γμμμ=xμ(ln|gμμ|12)
Here μ ν λ μ ν λ mu!=nu!=lambda\mu \neq \nu \neq \lambdaμνλ and we don't sum over repeated indices.

Chapter summary

  • The connection coefficients may be extracted using a simple routine based on extremizing the action.
  • The metric leads directly to the connection coefficients.

Exercises

(9.1) Using the methods described in the chapter, extract (9.3) Consider the non-diagonal metric the connection coefficients for two-dimensional plane polar coordinates.
(9.2) The torus has a line element
d s 2 = ( c + a cos v ) 2 d u 2 + a 2 d v 2 d s 2 = ( c + a cos v ) 2 d u 2 + a 2 d v 2 ds^(2)=(c+a cos v)^(2)du^(2)+a^(2)dv^(2)d s^{2}=(c+a \cos v)^{2} d u^{2}+a^{2} d v^{2}ds2=(c+acosv)2du2+a2dv2
Show that we obtain the connection coefficients
(9.36) Γ u u v = sin v u ( c + a cos v ) , Γ u v u = a sin v ( c + a cos v ) . (9.36) Γ u u v = sin v u ( c + a cos v ) , Γ u v u = a sin v ( c + a cos v ) . {:(9.36)Gamma_(uu)^(v)=(sin v)/(u)(c+a cos v)","quadGamma_(uv)^(u)=-(a sin v)/((c+a cos v)).:}\begin{equation*} \Gamma_{u u}^{v}=\frac{\sin v}{u}(c+a \cos v), \quad \Gamma_{u v}^{u}=-\frac{a \sin v}{(c+a \cos v)} . \tag{9.36} \end{equation*}(9.36)Γuuv=sinvu(c+acosv),Γuvu=asinv(c+acosv).
(9.37) d s 2 = d u 2 + d v 2 + 2 d u d v cos θ ( u , v ) (9.37) d s 2 = d u 2 + d v 2 + 2 d u d v cos θ ( u , v ) {:(9.37)ds^(2)=du^(2)+dv^(2)+2dudv cos theta(u","v):}\begin{equation*} \mathrm{d} s^{2}=\mathrm{d} u^{2}+\mathrm{d} v^{2}+2 \mathrm{~d} u \mathrm{~d} v \cos \theta(u, v) \tag{9.37} \end{equation*}(9.37)ds2=du2+dv2+2 du dvcosθ(u,v)
Show that the non-zero connection coefficients are given by
(9.38) Γ u u u = cos θ sin θ θ , u , Γ v u u = 1 sin θ θ , u , Γ v v v = cos θ sin θ θ , v , Γ v v u = 1 sin θ θ , v (9.38) Γ u u u = cos θ sin θ θ , u , Γ v u u = 1 sin θ θ , u , Γ v v v = cos θ sin θ θ , v , Γ v v u = 1 sin θ θ , v {:[(9.38)Gamma_(uu)^(u)=(cos theta)/(sin theta)theta_(,u)","quadGamma^(v)_(uu)=-(1)/(sin theta)theta_(,u)","],[Gamma^(v)_(vv)=(cos theta)/(sin theta)theta","v","quadGamma_(vv)^(u)=-(1)/(sin theta)theta","v]:}\begin{gather*} \Gamma_{u u}^{u}=\frac{\cos \theta}{\sin \theta} \theta_{, u}, \quad \Gamma^{v}{ }_{u u}=-\frac{1}{\sin \theta} \theta_{, u}, \tag{9.38}\\ \Gamma^{v}{ }_{v v}=\frac{\cos \theta}{\sin \theta} \theta, v, \quad \Gamma_{v v}^{u}=-\frac{1}{\sin \theta} \theta, v \end{gather*}(9.38)Γuuu=cosθsinθθ,u,Γvuu=1sinθθ,u,Γvvv=cosθsinθθ,v,Γvvu=1sinθθ,v
(9.4) Consider Rindler spacetime with metric line element
(9.39) d s 2 = x 2 d t 2 + d x 2 (9.39) d s 2 = x 2 d t 2 + d x 2 {:(9.39)ds^(2)=-x^(2)dt^(2)+dx^(2):}\begin{equation*} \mathrm{d} s^{2}=-x^{2} \mathrm{~d} t^{2}+\mathrm{d} x^{2} \tag{9.39} \end{equation*}(9.39)ds2=x2 dt2+dx2
(a) Show that the equations of motion are
(9.40) 2 x ˙ t ˙ + x t ¨ = 0 , x ¨ + x t ˙ 2 = 0 (9.40) 2 x ˙ t ˙ + x t ¨ = 0 , x ¨ + x t ˙ 2 = 0 {:(9.40)2x^(˙)t^(˙)+xt^(¨)=0","quadx^(¨)+xt^(˙)^(2)=0:}\begin{equation*} 2 \dot{x} \dot{t}+x \ddot{t}=0, \quad \ddot{x}+x \dot{t}^{2}=0 \tag{9.40} \end{equation*}(9.40)2x˙t˙+xt¨=0,x¨+xt˙2=0
and extract the connection coefficients.
(b) We can use the physical interpretation of the length parametrization to allow some insight into this spacetime. Define the velocity in the x x xxx direction as v = d x / d t v = d x / d t v=dx//dtv=\mathrm{d} x / \mathrm{d} tv=dx/dt and show that
(9.41) x ¨ = x x 2 v 2 (9.41) x ¨ = x x 2 v 2 {:(9.41)x^(¨)=-(x)/(x^(2)-v^(2)):}\begin{equation*} \ddot{x}=-\frac{x}{x^{2}-v^{2}} \tag{9.41} \end{equation*}(9.41)x¨=xx2v2
Now suppose that the particle does not have a velocity in the x x xxx-direction, so that v = 0 v = 0 v=0v=0v=0. The equation of motion says that in order to have v = 0 v = 0 v=0v=0v=0 in this spacetime we must have an observer undergoing a uniform acceleration x ¨ = 1 / x x ¨ = 1 / x x^(¨)=-1//x\ddot{x}=-1 / xx¨=1/x, which diverges as x 0 x 0 x rarr0x \rightarrow 0x0. We shall see this property again when we examine the spherically symmetric Schwarzschild geometry.
(9.5) Consider the rotating-frame line element
d s 2 = [ 1 Ω 2 ( x 2 + y 2 ) ] d t 2 + d x 2 + d y 2 (9.42) + d z 2 2 Ω y d x d t + 2 Ω x d y d t d s 2 = 1 Ω 2 x 2 + y 2 d t 2 + d x 2 + d y 2 (9.42) + d z 2 2 Ω y d x d t + 2 Ω x d y d t {:[ds^(2)=-[1-Omega^(2)(x^(2)+y^(2))]dt^(2)+dx^(2)+dy^(2)],[(9.42)+dz^(2)-2Omega ydxdt+2Omega xdydt]:}\begin{align*} \mathrm{d} s^{2} & =-\left[1-\Omega^{2}\left(x^{2}+y^{2}\right)\right] \mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2} \\ & +\mathrm{d} z^{2}-2 \Omega y \mathrm{~d} x \mathrm{~d} t+2 \Omega x \mathrm{~d} y \mathrm{~d} t \tag{9.42} \end{align*}ds2=[1Ω2(x2+y2)]dt2+dx2+dy2(9.42)+dz22Ωy dx dt+2Ωx dy dt
(a) Find the matrix g μ ν g μ ν g^(mu nu)g^{\mu \nu}gμν.
(b) Compute the connection coefficients for the space described by this line element.
(9.6) (a) Express the line element from the previous question in cylindrical polars.
(b) Compute the connection coefficients in cylindrical polars.
(9.7) The Schwarzschild metric gives an interval
s = d λ [ e 2 Φ ( d t d λ ) 2 e 2 Λ ( d r d λ ) 2 (9.43) r 2 ( d θ d λ ) 2 r 2 sin 2 θ ( d ϕ d λ ) 2 ] 1 2 s = d λ e 2 Φ d t d λ 2 e 2 Λ d r d λ 2 (9.43) r 2 d θ d λ 2 r 2 sin 2 θ d ϕ d λ 2 1 2 {:[s= intdlambda[e^(2Phi)(((d)t)/((d)lambda))^(2)-e^(2Lambda)(((d)r)/((d)lambda))^(2):}],[(9.43)-r^(2)(((d)theta)/((d)lambda))^(2)-r^(2)sin^(2)theta(((d)phi)/((d)lambda))^(2)]^((1)/(2))]:}\begin{align*} s= & \int \mathrm{d} \lambda\left[\mathrm{e}^{2 \Phi}\left(\frac{\mathrm{~d} t}{\mathrm{~d} \lambda}\right)^{2}-\mathrm{e}^{2 \Lambda}\left(\frac{\mathrm{~d} r}{\mathrm{~d} \lambda}\right)^{2}\right. \\ & \left.-r^{2}\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} \lambda}\right)^{2}-r^{2} \sin ^{2} \theta\left(\frac{\mathrm{~d} \phi}{\mathrm{~d} \lambda}\right)^{2}\right]^{\frac{1}{2}} \tag{9.43} \end{align*}s=dλ[e2Φ( dt dλ)2e2Λ( dr dλ)2(9.43)r2( dθ dλ)2r2sin2θ( dϕ dλ)2]12
where Φ Φ Phi\PhiΦ and Λ Λ Lambda\LambdaΛ are functions of r r rrr. Find the connection coefficients.
(9.8) We can start to understand the space represented by the Poincaré half plane in Example 9.3 by computing its geodesics. The geometry is defined by its metric in the upper half plane, r > 0 r > 0 r > 0r>0r>0, only.
(a) Consider the equation of motion x ¨ 2 r ˙ x ˙ / r = 0 x ¨ 2 r ˙ x ˙ / r = 0 x^(¨)-2r^(˙)x^(˙)//r=0\ddot{x}-2 \dot{r} \dot{x} / r=0x¨2r˙x˙/r=0, where the dot indicates a derivative with respect to
the affine parameter λ λ lambda\lambdaλ. Show that this equation is solved by
(9.44) x ˙ = r 2 a , (9.44) x ˙ = r 2 a , {:(9.44)x^(˙)=(r^(2))/(a)",":}\begin{equation*} \dot{x}=\frac{r^{2}}{a}, \tag{9.44} \end{equation*}(9.44)x˙=r2a,
where a a aaa is a constant.
(b) Consider the length parametrization condition L = 1 L = 1 L=1L=1L=1 and show that this yields
(9.45) d λ = d t sin t (9.45) d λ = d t sin t {:(9.45)dlambda=(dt)/(sin t):}\begin{equation*} \mathrm{d} \lambda=\frac{\mathrm{d} t}{\sin t} \tag{9.45} \end{equation*}(9.45)dλ=dtsint
where r = a sin t r = a sin t r=a sin tr=a \sin tr=asint.
(c) Use these results to show that x x xxx is given by
(9.46) x = a cos t + x 0 (9.46) x = a cos t + x 0 {:(9.46)x=-a cos t+x_(0):}\begin{equation*} x=-a \cos t+x_{0} \tag{9.46} \end{equation*}(9.46)x=acost+x0
where x 0 x 0 x_(0)x_{0}x0 is a constant offset.
(d) Argue that this shows that the geodesics are circular arcs, centred on ( x , r ) = ( x 0 , 0 ) ( x , r ) = x 0 , 0 (x,r)=(x_(0),0)(x, r)=\left(x_{0}, 0\right)(x,r)=(x0,0) with radius a a aaa, as shown in Fig. 9.2.
(e) Compute the length d λ d λ intdlambda\int \mathrm{d} \lambdadλ of a geodesic starting at t = a t = a t=at=at=a and finishing at t = b t = b t=bt=bt=b. Use this to show that the length of a geodesic starting at t = 0 t = 0 t=0t=0t=0 and ending at t = π t = π t=pit=\pit=π is infinite.
You can see why this is the case by considering a ruler of interval length Δ s Δ s Delta s\Delta sΔs parallel to the x x xxx axis. Since r r rrr is constant we have Δ s = Δ x / r Δ s = Δ x / r Delta s=Delta x//r\Delta s=\Delta x / rΔs=Δx/r. As a result, rulers of an equivalent interval length Δ s Δ s Delta s\Delta sΔs must have larger coordinate length Δ x Δ x Delta x\Delta xΔx if they are at a larger height r r rrr, as shown in Fig. 9.2.
Fig. 9.2 The Poincaré half plane from Exercise 9.8 and Example 9.3. An example geodesic is shown on the right. On the left several lines of equivalent interval length Δ s Δ s Delta s\Delta sΔs are shown.
(9.9) We end up with the same geodesics if we extremize L = g μ ν x ˙ μ x ˙ ν L = g μ ν x ˙ μ x ˙ ν L=sqrt(-g_(mu nu)x^(˙)^(mu)x^(˙)^(nu))L=\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}L=gμνx˙μx˙ν, and if we extremize L = 1 2 g μ ν x ˙ μ x ˙ ν L = 1 2 g μ ν x ˙ μ x ˙ ν L=(1)/(2)g_(mu nu)x^(˙)^(mu)x^(˙)^(nu)L=\frac{1}{2} g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}L=12gμνx˙μx˙ν. Show this by finding the EulerLagrange equation for a function
(9.47) L = F ( g μ ν x ˙ μ x ˙ ν ) (9.47) L = F g μ ν x ˙ μ x ˙ ν {:(9.47)L=F(sqrt(g_(mu nu)x^(˙)^(mu)x^(˙)^(nu))):}\begin{equation*} L=F\left(\sqrt{g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}\right) \tag{9.47} \end{equation*}(9.47)L=F(gμνx˙μx˙ν)
where x ˙ μ = d x μ d λ , λ x ˙ μ = d x μ d λ , λ x^(˙)^(mu)=(dx^(mu))/(dlambda),lambda\dot{x}^{\mu}=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda}, \lambdax˙μ=dxμdλ,λ is the proper length and F F FFF is any monotonic function.
This means we can equally well use a Lagrangian L = 1 2 g μ ν x ˙ μ x ˙ ν L = 1 2 g μ ν x ˙ μ x ˙ ν L=(1)/(2)g_(mu nu)x^(˙)^(mu)x^(˙)^(nu)L=\frac{1}{2} g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}L=12gμνx˙μx˙ν which resembles the kinetic energy of a non-relativistic particle.
(9.10) Consider the moving-coordinate metric
d s 2 = ( 1 v 2 ) d t 2 + d x 2 + d y 2 + d z 2 2 v d x d t d s 2 = 1 v 2 d t 2 + d x 2 + d y 2 + d z 2 2 v d x d t ds^(2)=-(1-v^(2))dt^(2)+dx^(2)+dy^(2)+dz^(2)-2vdxdt\mathrm{d} s^{2}=-\left(1-v^{2}\right) \mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}-2 v \mathrm{~d} x \mathrm{~d} tds2=(1v2)dt2+dx2+dy2+dz22v dx dt. (9.48)
By extremizing the world line, show that the geodesics are straight lines.
(9.11) Consider a two-dimensional space with metric line element
(9.49) d s 2 = ( 1 + r ) d r 2 + r 2 d ϕ 2 . (9.49) d s 2 = ( 1 + r ) d r 2 + r 2 d ϕ 2 . {:(9.49)ds^(2)=(1+r)dr^(2)+r^(2)dphi^(2).:}\begin{equation*} \mathrm{d} s^{2}=(1+r) \mathrm{d} r^{2}+r^{2} \mathrm{~d} \phi^{2} . \tag{9.49} \end{equation*}(9.49)ds2=(1+r)dr2+r2 dϕ2.
By computing the acceleration, determine whether the curve r ( λ ) = ( 3 λ / 2 ) 2 / 3 1 , ϕ ( λ ) = 0 r ( λ ) = ( 3 λ / 2 ) 2 / 3 1 , ϕ ( λ ) = 0 r(lambda)=(3lambda//2)^(2//3)-1,phi(lambda)=0r(\lambda)=(3 \lambda / 2)^{2 / 3}-1, \phi(\lambda)=0r(λ)=(3λ/2)2/31,ϕ(λ)=0, with λ λ lambda\lambdaλ an affine parameter, is a geodesic.

10

10.1 Observers and their observations 108 10.2 Coordinate and noncoordinate bases 110 10.3 The orthonormal frame 114 10.4 Freely falling frames 116 Chapter summary 118 Exercises
1 1 ^(1){ }^{1}1 Or, perhaps more memorably, in the words of the Time Traveller, 'There is no difference between Time and any of the three dimensions of Space except that our consciousness moves along it. But some foolish people have got hold of the wrong side of that idea.' H. G. Wells (1866-1946) The Time Machine.
Fig. 10.1 A measurement of momentum p p p\boldsymbol{p}p. In this figure, we use the familiar x ^ x ^ hat(x)\hat{x}x^ and y ^ y ^ hat(y)\hat{y}y^ axes of two-dimensional space (with basis vectors e x ^ e x ^ e_( hat(x))\boldsymbol{e}_{\hat{x}}ex^ and e y ^ e y ^ e_( hat(y))\boldsymbol{e}_{\hat{y}}ey^ ), but the idea carries over to the fourdimensional spacetime axes of an orthonormal frame with basis vectors e 0 ^ , e 1 ^ , e 2 ^ e 0 ^ , e 1 ^ , e 2 ^ e_( hat(0)),e_( hat(1)),e_( hat(2))\boldsymbol{e}_{\hat{0}}, \boldsymbol{e}_{\hat{1}}, \boldsymbol{e}_{\hat{2}}e0^,e1^,e2^ and e 3 ^ e 3 ^ e_( hat(3))\boldsymbol{e}_{\hat{3}}e3^.

Making measurements in relativity

Abstract

Why were another seven years required for the construction of the general theory of relativity? The main reason lies in the fact that it is not easy to free oneself from the idea that coordinates must have an immediate metrical meaning. Einstein quoted in P. A. Schilpp (ed.) Albert Einstein - Philosopher Scientist (1969).

The stage on which the drama of general relativity is played out is curved spacetime. However, measurements are made locally by observers in laboratories. Over the small distances involved in a typical experiment, an observer will experience spacetime as if it were flat spacetime with the Minkowski metric. We therefore need to know how to relate the observations made by observers in their local spacetime to the objects we manipulate in the curved spacetime of general relativity. The key is that measurements are made in local orthonormal frames: frames of reference set up by observers where the basis vectors are usually orthogonal and normalized. In this book, component labels in such frames will be given with hats, so that the basis vectors, for example, will be written as e α ^ e α ^ e_( hat(alpha))\boldsymbol{e}_{\hat{\alpha}}eα^. By definition, the metric of the flat, local frame is simply the Minkowski metric, with components η μ ^ ν ^ = e μ ^ e ν ^ = diag ( 1 , 1 , 1 , 1 ) η μ ^ ν ^ = e μ ^ e ν ^ = diag ( 1 , 1 , 1 , 1 ) eta_( hat(mu) hat(nu))=e_( hat(mu))*e_( hat(nu))=diag(-1,1,1,1)\eta_{\hat{\mu} \hat{\nu}}=\boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{\nu}}=\operatorname{diag}(-1,1,1,1)ημ^ν^=eμ^eν^=diag(1,1,1,1).
A further point to note in these discussions stems from Einstein's observation in the quotation above. When presented with a t t ttt coordinate or an r r rrr coordinate, it is tempting to assume that t t ttt must represent time and r r rrr the radius. This is not correct. Coordinates are intrinsically meaningless labels which are only given meaning by relating them to measurements and intervals determined by observers in their local, inertial frames of reference. This can be summed up using the slogan that coordinates have no immediate metrical significance. 1 1 ^(1){ }^{1}1

10.1 Observers and their observations

A particle passes through a laboratory as shown in Fig. 10.1. The particle has a momentum p p p\boldsymbol{p}p which, expressed as a vector, is a quantity independent of any set of coordinates. An observer in the laboratory makes measurements by carrying around their own orthonormal axes e 0 ^ , e 1 ^ , e 2 ^ e 0 ^ , e 1 ^ , e 2 ^ e_( hat(0)),e_( hat(1)),e_( hat(2))\boldsymbol{e}_{\hat{0}}, \boldsymbol{e}_{\hat{1}}, \boldsymbol{e}_{\hat{2}}e0^,e1^,e2^ and e 3 ^ e 3 ^ e_( hat(3))\boldsymbol{e}_{\hat{3}}e3^. Referred to these axes, the vector can be expressed in coordinates as p = p μ ^ e μ ^ p = p μ ^ e μ ^ p=p^( hat(mu))e_( hat(mu))\boldsymbol{p}=p^{\hat{\mu}} \boldsymbol{e}_{\hat{\mu}}p=pμ^eμ^. To make a measurement of a particular
component of a vector the observer projects out the component using their local axes. For example, measuring the momentum along the α ^ α ^ hat(alpha)\hat{\alpha}α^ direction means that the observer makes the projection via a dot product p e α ^ p e α ^ p*e_( hat(alpha))\boldsymbol{p} \cdot \boldsymbol{e}_{\hat{\alpha}}peα^.

Example 10.1

The observer makes the projection
p e α ^ = p μ ^ e μ ^ e α ^ (10.1) = p μ ^ η μ ^ α ^ = p α ^ p e α ^ = p μ ^ e μ ^ e α ^ (10.1) = p μ ^ η μ ^ α ^ = p α ^ {:[p*e_( hat(alpha))=p^( hat(mu))e_( hat(mu))*e_( hat(alpha))],[(10.1)=p^( hat(mu))eta_( hat(mu) hat(alpha))=p_( hat(alpha))]:}\begin{align*} \boldsymbol{p} \cdot \boldsymbol{e}_{\hat{\alpha}} & =p^{\hat{\mu}} \boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{\alpha}} \\ & =p^{\hat{\mu}} \eta_{\hat{\mu} \hat{\alpha}}=p_{\hat{\alpha}} \tag{10.1} \end{align*}peα^=pμ^eμ^eα^(10.1)=pμ^ημ^α^=pα^
showing that the observer has access to the component p α ^ p α ^ p_( hat(alpha))p_{\hat{\alpha}}pα^. The observer can use η α ^ β ^ η α ^ β ^ eta^( hat(alpha) hat(beta))\eta^{\hat{\alpha} \hat{\beta}}ηα^β^ to raise the index, if they want the up-index form of the component via p β ^ = η α ^ β ^ p α ^ p β ^ = η α ^ β ^ p α ^ p^( hat(beta))=eta^( hat(alpha) hat(beta))p_( hat(alpha))p^{\hat{\beta}}=\eta^{\hat{\alpha} \hat{\beta}} p_{\hat{\alpha}}pβ^=ηα^β^pα^.
If we spot an observer, how do we know what their local orthonormal axes will look like? That is, how will they orient their orthonormal coordinates? Start by noting that the observer's world line is characterized by their velocity vector u obs u obs u_(obs)\boldsymbol{u}_{\mathrm{obs}}uobs, which is tangent to the world line (Fig. 10.2). The key is that the timelike vector of the local basis e 0 ^ e 0 ^ e_( hat(0))\boldsymbol{e}_{\hat{0}}e0^ will also be tangent to the observer's world line, since this is the direction that a clock at rest in the observer's frame moves in spacetime. We therefore have
(10.2) e 0 ^ = u obs (10.2) e 0 ^ = u obs {:(10.2)e_( hat(0))=u_(obs):}\begin{equation*} \boldsymbol{e}_{\hat{0}}=\boldsymbol{u}_{\mathrm{obs}} \tag{10.2} \end{equation*}(10.2)e0^=uobs
Therefore, expressed in some coordinate frame, the observer's timelike axis e 0 ^ e 0 ^ e_( hat(0))\boldsymbol{e}_{\hat{0}}e0^ has the components we would ascribe to their tangent vector u obs u obs  u_("obs ")\boldsymbol{u}_{\text {obs }}uobs . We write these components of the observer's timelike basis vector
(10.3) ( e 0 ^ ) μ = u obs μ . (10.3) e 0 ^ μ = u obs μ . {:(10.3)(e_( hat(0)))^(mu)=u_(obs)^(mu).:}\begin{equation*} \left(\boldsymbol{e}_{\hat{0}}\right)^{\mu}=u_{\mathrm{obs}}^{\mu} . \tag{10.3} \end{equation*}(10.3)(e0^)μ=uobsμ.
The other components of the observer's orthonormal system can then be picked out, subject to being orthogonal to e 0 ^ e 0 ^ e_( hat(0))\boldsymbol{e}_{\hat{0}}e0^ and to each other.
Example 10.2
Consider the constantly accelerated observer in Minkowski space from Chapter 2. They have a timelike basis vector with components
(10.4) ( e 0 ^ ) μ = u obs μ ( τ ) = ( cosh ( g τ ) , sinh ( g τ ) , 0 , 0 ) (10.4) e 0 ^ μ = u obs μ ( τ ) = ( cosh ( g τ ) , sinh ( g τ ) , 0 , 0 ) {:(10.4)(e_( hat(0)))^(mu)=u_(obs)^(mu)(tau)=(cosh(g tau)","sinh(g tau)","0","0):}\begin{equation*} \left(e_{\hat{0}}\right)^{\mu}=u_{\mathrm{obs}}^{\mu}(\tau)=(\cosh (g \tau), \sinh (g \tau), 0,0) \tag{10.4} \end{equation*}(10.4)(e0^)μ=uobsμ(τ)=(cosh(gτ),sinh(gτ),0,0)
Pick e 2 e 2 e_(2)\boldsymbol{e}_{2}e2 and e 3 e 3 e_(3)\boldsymbol{e}_{3}e3 to point along the y y yyy - and z z zzz-directions. The remaining 4 -vector e 1 ^ e 1 ^ e_( hat(1))\boldsymbol{e}_{\hat{1}}e1^ has the form ( f ( τ ) , g ( τ ) , 0 , 0 ) ( f ( τ ) , g ( τ ) , 0 , 0 ) (f(tau),g(tau),0,0)(f(\tau), g(\tau), 0,0)(f(τ),g(τ),0,0). We require orthogonality of the observer's basis vectors, which is to say that
(10.5) η μ ν ( e 0 ^ ) μ ( e i ) ν = cosh ( g τ ) f ( τ ) + sinh ( g τ ) g ( τ ) = 0 (10.5) η μ ν e 0 ^ μ e i ν = cosh ( g τ ) f ( τ ) + sinh ( g τ ) g ( τ ) = 0 {:(10.5)eta_(mu nu)(e_( hat(0)))^(mu)(e_(i))^(nu)=-cosh(g tau)f(tau)+sinh(g tau)g(tau)=0:}\begin{equation*} \eta_{\mu \nu}\left(e_{\hat{0}}\right)^{\mu}\left(e_{\mathrm{i}}\right)^{\nu}=-\cosh (g \tau) f(\tau)+\sinh (g \tau) g(\tau)=0 \tag{10.5} \end{equation*}(10.5)ημν(e0^)μ(ei)ν=cosh(gτ)f(τ)+sinh(gτ)g(τ)=0
We also require that the vectors are normalized in the observer's frame, so that
f 2 ( τ ) + g 2 ( τ ) = 1 f 2 ( τ ) + g 2 ( τ ) = 1 -f^(2)(tau)+g^(2)(tau)=1-f^{2}(\tau)+g^{2}(\tau)=1f2(τ)+g2(τ)=1
This allows us to solve for ( e 1 ) μ e 1 μ (e_(1))^(mu)\left(e_{1}\right)^{\mu}(e1)μ and we end up with
( e o ^ ) μ = ( cosh ( g τ ) , sinh ( g τ ) , 0 , 0 ) , ( e 1 ) μ = ( sinh ( g τ ) , cosh ( g τ ) , 0 , 0 ) , ( e 2 ^ ) μ = ( 0 , 0 , 1 , 0 ) , ( e 3 ^ ) μ = ( 0 , 0 , 0 , 1 ) . e o ^ μ = ( cosh ( g τ ) , sinh ( g τ ) , 0 , 0 ) , e 1 μ = ( sinh ( g τ ) , cosh ( g τ ) , 0 , 0 ) , e 2 ^ μ = ( 0 , 0 , 1 , 0 ) , e 3 ^ μ = ( 0 , 0 , 0 , 1 ) . {:[(e_( hat(o)))^(mu)=(cosh(g tau)","sinh(g tau)","0","0)","],[(e_(1))^(mu)=(sinh(g tau)","cosh(g tau)","0","0)","],[(e_( hat(2)))^(mu)=(0","0","1","0)","],[(e_( hat(3)))^(mu)=(0","0","0","1).]:}\begin{aligned} \left(\boldsymbol{e}_{\hat{o}}\right)^{\mu} & =(\cosh (g \tau), \sinh (g \tau), 0,0), \\ \left(e_{1}\right)^{\mu} & =(\sinh (g \tau), \cosh (g \tau), 0,0), \\ \left(e_{\hat{2}}\right)^{\mu} & =(0,0,1,0), \\ \left(e_{\hat{3}}\right)^{\mu} & =(0,0,0,1) . \end{aligned}(eo^)μ=(cosh(gτ),sinh(gτ),0,0),(e1)μ=(sinh(gτ),cosh(gτ),0,0),(e2^)μ=(0,0,1,0),(e3^)μ=(0,0,0,1).
Fig. 10.2 Local orthonormal frames picked out along a world line by setting e 0 ^ = u e 0 ^ = u e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}e0^=u.
2 2 ^(2){ }^{2}2 We can evaluate the dot product using the Minkowski tensor in the rest frame of the particle. The result is a scalar, so is true in any frame.
3 3 ^(3){ }^{3}3 In our units, the vector k k k\boldsymbol{k}k has components k μ = ( ω , k x , k y , k z ) k μ = ω , k x , k y , k z k^(mu)=(omega,k^(x),k^(y),k^(z))k^{\mu}=\left(\omega, k^{x}, k^{y}, k^{z}\right)kμ=(ω,kx,ky,kz) and ω = | k | ω = | k | omega=| vec(k)|\omega=|\vec{k}|ω=|k| for light. We also assume the quantum mechanical relationship E = ω E = ω E=ℏomegaE=\hbar \omegaE=ω, but set = 1 = 1 ℏ=1\hbar=1=1.
4 4 ^(4){ }^{4}4 This result, discussed in the book by Hartle, will be seen again in the discus sion of black holes in Chapters 26 and 27.
One particularly helpful tool is that the energy of a particle measured by an observer with velocity u obs u obs u_(obs)\boldsymbol{u}_{\mathrm{obs}}uobs is given by E = p u obs E = p u obs E=-p*u_(obs)E=-\boldsymbol{p} \cdot \boldsymbol{u}_{\mathrm{obs}}E=puobs. This is easily confirmed by noting that
p u obs = p μ ^ e μ ^ e 0 ^ (10.8) = p μ η μ ^ 0 = p 0 ^ = E , p u obs = p μ ^ e μ ^ e 0 ^ (10.8) = p μ η μ ^ 0 = p 0 ^ = E , {:[-p*u_(obs)=-p^( hat(mu))e_( hat(mu))*e_( hat(0))],[(10.8)=-p^(mu)eta_( hat(mu)0)=-p_( hat(0))=E","]:}\begin{align*} -\boldsymbol{p} \cdot \boldsymbol{u}_{\mathrm{obs}} & =-p^{\hat{\mu}} \boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{0}} \\ & =-p^{\mu} \eta_{\hat{\mu} 0}=-p_{\hat{0}}=E, \tag{10.8} \end{align*}puobs=pμ^eμ^e0^(10.8)=pμημ^0=p0^=E,
where, in the final step we remember that E = p 0 ^ = p 0 ^ E = p 0 ^ = p 0 ^ E=p^( hat(0))=-p_( hat(0))E=p^{\hat{0}}=-p_{\hat{0}}E=p0^=p0^ because η 0 ^ 0 ^ = 1 η 0 ^ 0 ^ = 1 eta_( hat(0) hat(0))=-1\eta_{\hat{0} \hat{0}}=-1η0^0^=1 in the orthonormal frame.

Example 10.3

Consider Minkowski space. In a frame where a particle is at rest, the particle has p μ = ( m , 0 , 0 , 0 ) p μ = ( m , 0 , 0 , 0 ) p^(mu)=(m,0,0,0)p^{\mu}=(m, 0,0,0)pμ=(m,0,0,0). Relative to this frame the observer travels with constant speed v v vvv along the x x xxx-axis, so the 4 -velocity of the observer has components u obs μ = ( γ , γ v , 0 , 0 ) u obs μ = ( γ , γ v , 0 , 0 ) u_(obs)^(mu)=(gamma,gamma v,0,0)u_{\mathrm{obs}}^{\mu}=(\gamma, \gamma v, 0,0)uobsμ=(γ,γv,0,0), which are therefore also the components of the local basis vector ( e 0 ^ ) μ e 0 ^ μ (e_( hat(0)))^(mu)\left(\boldsymbol{e}_{\hat{0}}\right)^{\mu}(e0^)μ. The energy of the particle measured by the observer, when the world lines of particle and observer intersect, is 2 2 ^(2){ }^{2}2
(10.9) E = p u obs = m γ . (10.9) E = p u obs = m γ . {:(10.9)E=-p*u_(obs)=m gamma.:}\begin{equation*} E=-\boldsymbol{p} \cdot \boldsymbol{u}_{\mathrm{obs}}=m \gamma . \tag{10.9} \end{equation*}(10.9)E=puobs=mγ.
So the particle has energy E = γ m c 2 E = γ m c 2 E=gamma mc^(2)E=\gamma m c^{2}E=γmc2 (restoring factors of c c ccc ) as we expect
Now consider the accelerated observer from the previous example, measuring light from a star in Minkowski space. The star gives out light at a frequency ω ω omega\omegaω. The wave 4 -vector of a photon reaching the observer has components 3 k μ = ( ω , ω , 0 , 0 ) 3 k μ = ( ω , ω , 0 , 0 ) ^(3)k^(mu)=(omega,omega,0,0){ }^{3} k^{\mu}=(\omega, \omega, 0,0)3kμ=(ω,ω,0,0) in the star's rest frame. The method for finding the frequency the observer measures is the same as for the energy, to which frequency is proportional. We evaluate ω = k u obs ω = k u obs omega=-k*u_(obs)\omega=-\boldsymbol{k} \cdot \boldsymbol{u}_{\mathrm{obs}}ω=kuobs Computing, we find
ω ( τ ) = k u obs = k 0 u 0 k 1 u 1 = ω [ cosh ( g τ ) sinh ( g τ ) ] (10.10) = ω exp ( g τ ) ω ( τ ) = k u obs  = k 0 u 0 k 1 u 1 = ω [ cosh ( g τ ) sinh ( g τ ) ] (10.10) = ω exp ( g τ ) {:[omega(tau)=-k*u_("obs ")],[=k^(0)u^(0)-k^(1)u^(1)],[=omega[cosh(g tau)-sinh(g tau)]],[(10.10)=omega exp(-g tau)]:}\begin{align*} \omega(\tau) & =-\boldsymbol{k} \cdot \boldsymbol{u}_{\text {obs }} \\ & =k^{0} u^{0}-k^{1} u^{1} \\ & =\omega[\cosh (g \tau)-\sinh (g \tau)] \\ & =\omega \exp (-g \tau) \tag{10.10} \end{align*}ω(τ)=kuobs =k0u0k1u1=ω[cosh(gτ)sinh(gτ)](10.10)=ωexp(gτ)
which demonstrates that the shift in observed frequency varies exponentially. 4 4 ^(4){ }^{4}4
The procedure above allows us to understand what an observer will measure. However, this is complicated by the fact that the natural, orthonormal coordinate system that observers employ has an unpleasant mathematical property: it is a non-coordinate basis, as we now describe.

10.2 Coordinate and non-coordinate bases

Recall from Chapter 3, that plane polar coordinates had the property that | e r | = 1 e r = 1 |e_(r)|=1\left|\boldsymbol{e}_{r}\right|=1|er|=1 but that | e θ | = r e θ = r |e_(theta)|=r\left|\boldsymbol{e}_{\theta}\right|=r|eθ|=r. That is, the length of the e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ basis vector is proportional to the distance away from the origin. This coordinate system was derived from the Cartesian one by expressing components as partial derivatives with respect to the Cartesian coordinates. We call such a coordinate system a coordinate basis.
We could choose to normalize e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ so that we have e θ ^ = e θ / r e θ ^ = e θ / r e_( hat(theta))=e_(theta)//r\boldsymbol{e}_{\hat{\theta}}=\boldsymbol{e}_{\theta} / reθ^=eθ/r, which yields an orthonormal basis set e r ^ ( = e r ) e r ^ = e r e_( hat(r))(=e_(r))\boldsymbol{e}_{\hat{r}}\left(=\boldsymbol{e}_{r}\right)er^(=er) and e θ ^ e θ ^ e_( hat(theta))\boldsymbol{e}_{\hat{\theta}}eθ^. Although this is the
coordinate set we would most probably want to choose when plotting the position of events in the laboratory, it does not have the property that it is derivable directly from the Cartesian coordinates. That is to say that the basis vectors e r e r e_(r)\boldsymbol{e}_{r}er and e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ cannot be written as an expansion in Cartesian basis vectors with prefactors given in terms of derivatives of to the Cartesian components with respect to r r rrr and θ θ theta\thetaθ (see eqn 3.7). We call such a basis a 5 a 5 a^(5)\mathrm{a}^{5}a5 non-coordinate basis.
Example 10.4
Non-coordinate bases are useful for many problems. Consider, for example, the Kepler problem, which is conventionally discussed using an orthonormal basis and cylindrical coordinates. In such a frame, the velocity v = v r ^ e r ^ + v θ ^ e θ ^ v = v r ^ e r ^ + v θ ^ e θ ^ v=v^( hat(r))e_( hat(r))+v^( hat(theta))e_( hat(theta))\boldsymbol{v}=v^{\hat{r}} \boldsymbol{e}_{\hat{r}}+v^{\hat{\theta}} \boldsymbol{e}_{\hat{\theta}}v=vr^er^+vθ^eθ^ is given by
(10.11) v = d r d t e r ^ + r d e r ^ d t , (10.11) v = d r d t e r ^ + r d e r ^ d t , {:(10.11)v=(dr)/((d)t)e_( hat(r))+r((d)e_( hat(r)))/(dt)",":}\begin{equation*} \boldsymbol{v}=\frac{\mathrm{d} r}{\mathrm{~d} t} \boldsymbol{e}_{\hat{r}}+r \frac{\mathrm{~d} \boldsymbol{e}_{\hat{\boldsymbol{r}}}}{\mathrm{d} t}, \tag{10.11} \end{equation*}(10.11)v=dr dter^+r der^dt,
that is, we take a time derivative of r = r e r ^ r = r e r ^ r=re_( hat(r))\boldsymbol{r}=r \boldsymbol{e}_{\hat{r}}r=rer^. Acceleration is then given by
(10.12) a = d v d t = d v r ^ d t e r ^ + v r ^ d e r ^ d t + d v θ ^ d t e θ ^ + v θ ^ d e θ ^ d t . (10.12) a = d v d t = d v r ^ d t e r ^ + v r ^ d e r ^ d t + d v θ ^ d t e θ ^ + v θ ^ d e θ ^ d t . {:(10.12)a=(dv)/((d)t)=(dv^( hat(r)))/((d)t)e_( hat(r))+v^( hat(r))((d)e_( hat(r)))/((d)t)+(dv^( hat(theta)))/(dt)e_( hat(theta))+v^( hat(theta))(de_( hat(theta)))/(dt).:}\begin{equation*} \boldsymbol{a}=\frac{\mathrm{d} \boldsymbol{v}}{\mathrm{~d} t}=\frac{\mathrm{d} v^{\hat{r}}}{\mathrm{~d} t} \boldsymbol{e}_{\hat{r}}+v^{\hat{r}} \frac{\mathrm{~d} \boldsymbol{e}_{\hat{r}}}{\mathrm{~d} t}+\frac{\mathrm{d} v^{\hat{\theta}}}{\mathrm{d} t} \boldsymbol{e}_{\hat{\theta}}+v^{\hat{\theta}} \frac{\mathrm{d} \boldsymbol{e}_{\hat{\theta}}}{\mathrm{d} t} . \tag{10.12} \end{equation*}(10.12)a=dv dt=dvr^ dter^+vr^ der^ dt+dvθ^dteθ^+vθ^deθ^dt.
The basis vectors, as shown in Fig. 10.3, obey
(10.13) d e r ^ d t = d θ d t e θ ^ = ω e θ ^ (10.14) d e θ ^ d t = d θ d t e r ^ = ω e r ^ (10.13) d e r ^ d t = d θ d t e θ ^ = ω e θ ^ (10.14) d e θ ^ d t = d θ d t e r ^ = ω e r ^ {:[(10.13)(de_( hat(r)))/((d)t)=(dtheta)/((d)t)e_( hat(theta))=omegae_( hat(theta))],[(10.14)(de_( hat(theta)))/(dt)=-(dtheta)/((d)t)e_( hat(r))=-omegae_( hat(r))]:}\begin{gather*} \frac{\mathrm{d} \boldsymbol{e}_{\hat{r}}}{\mathrm{~d} t}=\frac{\mathrm{d} \theta}{\mathrm{~d} t} \boldsymbol{e}_{\hat{\theta}}=\omega \boldsymbol{e}_{\hat{\theta}} \tag{10.13}\\ \frac{\mathrm{d} \boldsymbol{e}_{\hat{\theta}}}{\mathrm{d} t}=-\frac{\mathrm{d} \theta}{\mathrm{~d} t} \boldsymbol{e}_{\hat{r}}=-\omega \boldsymbol{e}_{\hat{r}} \tag{10.14} \end{gather*}(10.13)der^ dt=dθ dteθ^=ωeθ^(10.14)deθ^dt=dθ dter^=ωer^
where ω = d θ / d t ω = d θ / d t omega=dtheta//dt\omega=\mathrm{d} \theta / \mathrm{d} tω=dθ/dt. The components of the acceleration are then written as
(10.15) a r ^ = d v r ^ d t v θ ^ d θ d t = d 2 r d t 2 r ( d θ d t ) 2 = r ¨ r θ ˙ 2 (10.15) a r ^ = d v r ^ d t v θ ^ d θ d t = d 2 r d t 2 r d θ d t 2 = r ¨ r θ ˙ 2 {:(10.15)a^( hat(r))=(dv^( hat(r)))/((d)t)-v^( hat(theta))(dtheta)/((d)t)=(d^(2)r)/((d)t^(2))-r(((d)theta)/((d)t))^(2)=r^(¨)-rtheta^(˙)^(2):}\begin{equation*} a^{\hat{r}}=\frac{\mathrm{d} v^{\hat{r}}}{\mathrm{~d} t}-v^{\hat{\theta}} \frac{\mathrm{d} \theta}{\mathrm{~d} t}=\frac{\mathrm{d}^{2} r}{\mathrm{~d} t^{2}}-r\left(\frac{\mathrm{~d} \theta}{\mathrm{~d} t}\right)^{2}=\ddot{r}-r \dot{\theta}^{2} \tag{10.15} \end{equation*}(10.15)ar^=dvr^ dtvθ^dθ dt=d2r dt2r( dθ dt)2=r¨rθ˙2
and
(10.16) a θ ^ = d v θ ^ d t + v r ^ d θ d t = d d t ( r d θ d t ) + d r d t d θ d t = r θ ¨ + 2 r ˙ θ ˙ (10.16) a θ ^ = d v θ ^ d t + v r ^ d θ d t = d d t r d θ d t + d r d t d θ d t = r θ ¨ + 2 r ˙ θ ˙ {:(10.16)a^( hat(theta))=(dv^( hat(theta)))/(dt)+v^( hat(r))((d)theta)/((d)t)=(d)/((d)t)(r((d)theta)/((d)t))+(dr)/((d)t)*((d)theta)/((d)t)=rtheta^(¨)+2r^(˙)theta^(˙):}\begin{equation*} a^{\hat{\theta}}=\frac{\mathrm{d} v^{\hat{\theta}}}{\mathrm{d} t}+v^{\hat{r}} \frac{\mathrm{~d} \theta}{\mathrm{~d} t}=\frac{\mathrm{d}}{\mathrm{~d} t}\left(r \frac{\mathrm{~d} \theta}{\mathrm{~d} t}\right)+\frac{\mathrm{d} r}{\mathrm{~d} t} \cdot \frac{\mathrm{~d} \theta}{\mathrm{~d} t}=r \ddot{\theta}+2 \dot{\mathrm{r}} \dot{\theta} \tag{10.16} \end{equation*}(10.16)aθ^=dvθ^dt+vr^ dθ dt=d dt(r dθ dt)+dr dt dθ dt=rθ¨+2r˙θ˙
We shall usually work in coordinate bases owing to their neat geometrical properties. However, observers work in local frames, which is where their measurements are made. These are chosen to be orthonormal and hence usually have non-coordinate bases. We therefore need to be able to transform between coordinate bases (where we do our calculations) and non-coordinate bases (where measurements are made).
The basis vectors in a coordinate basis are related to the metric via the definition
(10.17) e μ e ν = g μ ν (10.17) e μ e ν = g μ ν {:(10.17)e_(mu)*e_(nu)=g_(mu nu):}\begin{equation*} \boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}=g_{\mu \nu} \tag{10.17} \end{equation*}(10.17)eμeν=gμν
Measurements are made in the local orthonormal frame, with Minkowski metric whose components are related to the dot product of the basis vectors via
(10.18) e α ^ e β ^ = η α ^ β ^ (10.18) e α ^ e β ^ = η α ^ β ^ {:(10.18)e_( hat(alpha))*e_( hat(beta))=eta_( hat(alpha) hat(beta)):}\begin{equation*} \boldsymbol{e}_{\hat{\alpha}} \cdot \boldsymbol{e}_{\hat{\beta}}=\eta_{\hat{\alpha} \hat{\beta}} \tag{10.18} \end{equation*}(10.18)eα^eβ^=ηα^β^
Any vector can be decomposed in either system:
(10.19) a = a α e α = a β ^ e β ^ (10.19) a = a α e α = a β ^ e β ^ {:(10.19)a=a^(alpha)e_(alpha)=a^( hat(beta))e_( hat(beta)):}\begin{equation*} \boldsymbol{a}=a^{\alpha} \boldsymbol{e}_{\alpha}=a^{\hat{\beta}} \boldsymbol{e}_{\hat{\beta}} \tag{10.19} \end{equation*}(10.19)a=aαeα=aβ^eβ^
5 5 ^(5){ }^{5}5 As described in Chapter 3, a noncoordinate basis can be identified because its basis vectors don't commute, in contrast to the basis vectors of a coordinate basis which do commute.

darr\downarrow See Chapter 20 for a dis cussion of the Kepler problem.

Fig. 10.3 The change of the vector e r ^ e r ^ e_( hat(r))\boldsymbol{e}_{\hat{r}}er^ is in the direction e θ ^ e θ ^ e_( hat(theta))\boldsymbol{e}_{\hat{\theta}}eθ^, which the change in e θ ^ e θ ^ e_( hat(theta))\boldsymbol{e}_{\hat{\theta}}eθ^ is in the direction e r ^ e r ^ -e_( hat(r))-\boldsymbol{e}_{\hat{r}}er^.
In order to transform between these two descriptions, we write the components of the orthonormal basis vectors in the coordinate frame as ( e β ^ ) α e β ^ α (e_( hat(beta)))^(alpha)\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\alpha}(eβ^)α, giving us an expression
(10.20) a α = a β ^ ( e β ^ ) α . (10.20) a α = a β ^ e β ^ α . {:(10.20)a^(alpha)=a^( hat(beta))(e_( hat(beta)))^(alpha).:}\begin{equation*} a^{\alpha}=a^{\hat{\beta}}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\alpha} . \tag{10.20} \end{equation*}(10.20)aα=aβ^(eβ^)α.
Objects such as ( e β ^ ) α e β ^ α (e_( hat(beta)))^(alpha)\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\alpha}(eβ^)α are a set of matrices known as the components of
6 6 ^(6){ }^{6}6 Which translates from German into English as many-leg. Since the vielbein with which we're concerned describes (3+1)-dimensional spacetime, it is sometimes called a vierbein ( 4 leg 4 leg -=4-leg\equiv 4-\mathrm{leg}4leg ). The bracket in the notation ( e μ ) α ^ e μ α ^ (e_(mu))^( hat(alpha))\left(e_{\mu}\right)^{\hat{\alpha}}(eμ)α^ is really just there for aesthetic reasons to remind us that a vielbein combines information about two different sorts of coordinate systems. We could write the components e μ α e μ α e_(mu)^(alpha)\boldsymbol{e}_{\mu}^{\alpha}eμα if we prefer.
7 7 ^(7){ }^{7}7 One way to think of the action of a vielbein is that the coordinate frame possesses a set of global coordinates [e.g. ( t , r , θ , ϕ ) ] ( t , r , θ , ϕ ) ] (t,r,theta,phi)](t, r, \theta, \phi)](t,r,θ,ϕ)] and that we make them local using the vielbein.
8 8 ^(8){ }^{8}8 Our use of this notation here follows Hartle. We will record vielbein components in margin notes throughout the book. The vielbein components in this case are
( e r ^ ) r = 1 , ( e θ ^ ) θ = 1 r e r ^ r = 1 , e θ ^ θ = 1 r (e_( hat(r)))^(r)=1,quad(e_( hat(theta)))^(theta)=(1)/(r)\left(\boldsymbol{e}_{\hat{r}}\right)^{r}=1, \quad\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}=\frac{1}{r}(er^)r=1,(eθ^)θ=1r,
( e r ) r ^ = 1 , ( e θ ) θ ^ = r e r r ^ = 1 , e θ θ ^ = r (e_(r))^( hat(r))=1,quad(e_(theta))^( hat(theta))=r\left(e_{r}\right)^{\hat{r}}=1, \quad\left(e_{\theta}\right)^{\hat{\theta}}=r(er)r^=1,(eθ)θ^=r
a vielbein. 6 6 ^(6){ }^{6}6 These turn out to be very useful. 7 7 ^(7){ }^{7}7 We can also write the components of the coordinate basis vectors in the orthonormal frame, which leads to the expression
(10.21) a β ^ = a α ( e α ) β ^ (10.21) a β ^ = a α e α β ^ {:(10.21)a^( hat(beta))=a^(alpha)(e_(alpha))^( hat(beta)):}\begin{equation*} a^{\hat{\beta}}=a^{\alpha}\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\beta}} \tag{10.21} \end{equation*}(10.21)aβ^=aα(eα)β^

Example 10.5

Consider cylindrical-polar coordinates. The coordinate basis consists of vectors e r e r e_(r)e_{r}er and e θ e θ e_(theta)\boldsymbol{e}_{\theta}eθ and we write components ( r , θ ) ( r , θ ) (r,theta)(r, \theta)(r,θ). The metric is given by the line element
(10.22) d s 2 = d r 2 + r 2 d ϕ 2 (10.22) d s 2 = d r 2 + r 2 d ϕ 2 {:(10.22)ds^(2)=dr^(2)+r^(2)dphi^(2):}\begin{equation*} \mathrm{d} s^{2}=\mathrm{d} r^{2}+r^{2} \mathrm{~d} \phi^{2} \tag{10.22} \end{equation*}(10.22)ds2=dr2+r2 dϕ2
The off-diagonal elements of the metric are zero; while the diagonal components are g r r = 1 g r r = 1 g_(rr)=1g_{r r}=1grr=1 and g θ θ = r 2 g θ θ = r 2 g_(theta theta)=r^(2)g_{\theta \theta}=r^{2}gθθ=r2. From g μ ν = e μ e ν g μ ν = e μ e ν g_(mu nu)=e_(mu)*e_(nu)g_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}gμν=eμeν, the coordinate basis vectors obey
(10.23) | e r | = 1 , | e θ | = r (10.23) e r = 1 , e θ = r {:(10.23)|e_(r)|=1","quad|e_(theta)|=r:}\begin{equation*} \left|\boldsymbol{e}_{r}\right|=1, \quad\left|\boldsymbol{e}_{\theta}\right|=r \tag{10.23} \end{equation*}(10.23)|er|=1,|eθ|=r
and we argued that a good set of orthonormal basis vectors are given by
(10.24) e r ^ = e r , e θ ^ = 1 r e θ (10.24) e r ^ = e r , e θ ^ = 1 r e θ {:(10.24)e_( hat(r))=e_(r)","quade_( hat(theta))=(1)/(r)e_(theta):}\begin{equation*} e_{\hat{r}}=e_{r}, \quad e_{\hat{\theta}}=\frac{1}{r} e_{\theta} \tag{10.24} \end{equation*}(10.24)er^=er,eθ^=1reθ
Let's first look at how vielbein notation works. Quite trivially, we can say that the components of the coordinate basis vectors in the coordinate basis are, by definition
(10.25) ( e r ) μ = ( 1 , 0 ) , ( e θ ) μ = ( 0 , 1 ) . (10.25) e r μ = ( 1 , 0 ) , e θ μ = ( 0 , 1 ) . {:(10.25)(e_(r))^(mu)=(1","0)","quad(e_(theta))^(mu)=(0","1).:}\begin{equation*} \left(\boldsymbol{e}_{r}\right)^{\mu}=(1,0), \quad\left(\boldsymbol{e}_{\theta}\right)^{\mu}=(0,1) . \tag{10.25} \end{equation*}(10.25)(er)μ=(1,0),(eθ)μ=(0,1).
Similarly, in the orthornormal frame we write the trivial equation
(10.26) ( e r ^ ) μ ^ = ( 1 , 0 ) , ( e θ ^ ) μ ^ = ( 0 , 1 ) . (10.26) e r ^ μ ^ = ( 1 , 0 ) , e θ ^ μ ^ = ( 0 , 1 ) . {:(10.26)(e_( hat(r)))^( hat(mu))=(1","0)","quad(e_( hat(theta)))^( hat(mu))=(0","1).:}\begin{equation*} \left(\boldsymbol{e}_{\hat{r}}\right)^{\hat{\mu}}=(1,0), \quad\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\hat{\mu}}=(0,1) . \tag{10.26} \end{equation*}(10.26)(er^)μ^=(1,0),(eθ^)μ^=(0,1).
More interestingly, the components of the orthonormal basis vectors in the coordinate basis can be written down and are 8 8 ^(8){ }^{8}8
(10.27) ( e r ^ ) μ = ( 1 , 0 ) , ( e θ ^ ) μ = ( 0 , 1 / r ) . (10.27) e r ^ μ = ( 1 , 0 ) , e θ ^ μ = ( 0 , 1 / r ) . {:(10.27)(e_( hat(r)))^(mu)=(1","0)","quad(e_( hat(theta)))^(mu)=(0","1//r).:}\begin{equation*} \left(\boldsymbol{e}_{\hat{r}}\right)^{\mu}=(1,0), \quad\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\mu}=(0,1 / r) . \tag{10.27} \end{equation*}(10.27)(er^)μ=(1,0),(eθ^)μ=(0,1/r).
How do we know that the orthonormal vectors we have selected are correct? The key is that they must obey the defining relationship η μ ^ ν ^ = e μ ^ e ν ^ = g α β ( e μ ^ ) α ( e ν ^ ) β η μ ^ ν ^ = e μ ^ e ν ^ = g α β e μ ^ α e ν ^ β eta_( hat(mu) hat(nu))=e_( hat(mu))*e_( hat(nu))=g_(alpha beta)(e_( hat(mu)))^(alpha)(e_( hat(nu)))^(beta)\eta_{\hat{\mu} \hat{\nu}}=\boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{\nu}}=g_{\alpha \beta}\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\alpha}\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\beta}ημ^ν^=eμ^eν^=gαβ(eμ^)α(eν^)β and this is quickly checked using g r r = 1 g r r = 1 g_(rr)=1g_{r r}=1grr=1 and g θ θ = r 2 g θ θ = r 2 g_(theta theta)=r^(2)g_{\theta \theta}=r^{2}gθθ=r2. For example
e r ^ e r ^ = g r r ( e r ^ ) r ( e r ^ ) r + g θ θ ( e r ^ ) θ ( e r ^ ) θ = 1 + 0 = 1 , (10.28) e θ ^ e θ ^ = g r r ( e θ ^ ) r ( e θ ^ ) r + g θ θ ( e θ ^ ) θ ( e r ^ ) θ = 0 + r 2 1 r 1 r = 1 e r ^ e r ^ = g r r e r ^ r e r ^ r + g θ θ e r ^ θ e r ^ θ = 1 + 0 = 1 , (10.28) e θ ^ e θ ^ = g r r e θ ^ r e θ ^ r + g θ θ e θ ^ θ e r ^ θ = 0 + r 2 1 r 1 r = 1 {:[e_( hat(r))*e_( hat(r))=g_(rr)(e_( hat(r)))^(r)(e_( hat(r)))^(r)+g_(theta theta)(e_( hat(r)))^(theta)(e_( hat(r)))^(theta)=1+0=1","],[(10.28)e_( hat(theta))*e_( hat(theta))=g_(rr)(e_( hat(theta)))^(r)(e_( hat(theta)))^(r)+g_(theta theta)(e_( hat(theta)))^(theta)(e_( hat(r)))^(theta)=0+r^(2)(1)/(r)*(1)/(r)=1]:}\begin{align*} & \boldsymbol{e}_{\hat{r}} \cdot \boldsymbol{e}_{\hat{r}}=g_{r r}\left(\boldsymbol{e}_{\hat{r}}\right)^{r}\left(\boldsymbol{e}_{\hat{r}}\right)^{r}+g_{\theta \theta}\left(\boldsymbol{e}_{\hat{r}}\right)^{\theta}\left(\boldsymbol{e}_{\hat{r}}\right)^{\theta}=1+0=1, \\ & \boldsymbol{e}_{\hat{\theta}} \cdot \boldsymbol{e}_{\hat{\theta}}=g_{r r}\left(\boldsymbol{e}_{\hat{\theta}}\right)^{r}\left(\boldsymbol{e}_{\hat{\theta}}\right)^{r}+g_{\theta \theta}\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}\left(\boldsymbol{e}_{\hat{r}}\right)^{\theta}=0+r^{2} \frac{1}{r} \cdot \frac{1}{r}=1 \tag{10.28} \end{align*}er^er^=grr(er^)r(er^)r+gθθ(er^)θ(er^)θ=1+0=1,(10.28)eθ^eθ^=grr(eθ^)r(eθ^)r+gθθ(eθ^)θ(er^)θ=0+r21r1r=1
This shows that our choice is correct, since η r ^ r ~ ^ = η θ ^ θ ^ = 1 η r ^ r ~ ^ = η θ ^ θ ^ = 1 eta_( hat(r) hat(tilde(r)))=eta_( hat(theta) hat(theta))=1\eta_{\hat{\mathrm{r}} \hat{\tilde{r}}}=\eta_{\hat{\theta} \hat{\theta}}=1ηr^r~^=ηθ^θ^=1. Similarly, it follows that the coordinate basis vectors in the orthonormal basis are
(10.29) ( e r ) μ ^ = ( 1 , 0 ) , ( e θ ) μ ^ = ( 0 , r ) . (10.29) e r μ ^ = ( 1 , 0 ) , e θ μ ^ = ( 0 , r ) . {:(10.29)(e_(r))^( hat(mu))=(1","0)","quad(e_(theta))^( hat(mu))=(0","r).:}\begin{equation*} \left(\boldsymbol{e}_{r}\right)^{\hat{\mu}}=(1,0), \quad\left(\boldsymbol{e}_{\theta}\right)^{\hat{\mu}}=(0, r) . \tag{10.29} \end{equation*}(10.29)(er)μ^=(1,0),(eθ)μ^=(0,r).
These must obey the defining relationship e μ ( x ) e ν ( x ) = g μ ν ( x ) e μ ( x ) e ν ( x ) = g μ ν ( x ) e_(mu)(x)*e_(nu)(x)=g_(mu nu)(x)\boldsymbol{e}_{\mu}(x) \cdot \boldsymbol{e}_{\nu}(x)=g_{\mu \nu}(x)eμ(x)eν(x)=gμν(x), which they do.
A vielbein can be presented as a matrix as we now demonstrate.

Example 10.6

Consider the metric with line element d s 2 = d θ 2 + sin 2 θ d ϕ 2 d s 2 = d θ 2 + sin 2 θ d ϕ 2 ds^(2)=dtheta^(2)+sin^(2)thetadphi^(2)\mathrm{d} s^{2}=\mathrm{d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}ds2=dθ2+sin2θ dϕ2. Using the method above, we can compute the matrix representing the components of the coordinate basis vectors in the orthonormal basis 9 9 ^(9){ }^{9}9
(10.30) ( e μ ) α ^ = ( ( e 1 ) 1 ^ ( e 1 ) 2 ^ ( e 2 ) 1 ^ ( e 2 ) 2 ^ ) = ( 1 0 0 sin θ ) . (10.30) e μ α ^ = e 1 1 ^ e 1 2 ^ e 2 1 ^ e 2 2 ^ = 1 0 0 sin θ . {:(10.30)(e_(mu))^( hat(alpha))=([(e_(1))^( hat(1)),(e_(1))^( hat(2))],[(e_(2))^( hat(1)),(e_(2))^( hat(2))])=([1,0],[0,sin theta]).:}\left(e_{\mu}\right)^{\hat{\alpha}}=\left(\begin{array}{ll} \left(e_{1}\right)^{\hat{1}} & \left(e_{1}\right)^{\hat{2}} \tag{10.30}\\ \left(e_{2}\right)^{\hat{1}} & \left(e_{2}\right)^{\hat{2}} \end{array}\right)=\left(\begin{array}{cc} 1 & 0 \\ 0 & \sin \theta \end{array}\right) .(10.30)(eμ)α^=((e1)1^(e1)2^(e2)1^(e2)2^)=(100sinθ).
As the vielbein is simply a square matrix, the usual rules for inverses apply and so we have
(10.31) ( e μ ) α ^ ( e α ^ ) ν = δ ν μ ( e μ ) α ^ ( e β ^ ) μ = δ β ^ α ^ . (10.31) e μ α ^ e α ^ ν = δ ν μ e μ α ^ e β ^ μ = δ β ^ α ^ . {:(10.31)(e_(mu))^( hat(alpha))(e_( hat(alpha)))^(nu)=delta^(nu)_(mu)quad(e_(mu))^( hat(alpha))(e_( hat(beta)))^(mu)=delta_( hat(beta))^( hat(alpha)).:}\begin{equation*} \left(\boldsymbol{e}_{\mu}\right)^{\hat{\alpha}}\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\nu}=\delta^{\nu}{ }_{\mu} \quad\left(\boldsymbol{e}_{\mu}\right)^{\hat{\alpha}}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\mu}=\delta_{\hat{\beta}}^{\hat{\alpha}} . \tag{10.31} \end{equation*}(10.31)(eμ)α^(eα^)ν=δνμ(eμ)α^(eβ^)μ=δβ^α^.
The inverse matrix follows representing the components of the orthonormal basis vectors in the coordinate basis
( e α ^ ) μ = ( ( e 1 ^ ) 1 ( e 1 ^ ) 2 ( e 2 ^ ) 1 ( e 2 ^ ) 2 ) = ( 1 0 0 1 sin θ ) e α ^ μ = e 1 ^ 1 e 1 ^ 2 e 2 ^ 1 e 2 ^ 2 = 1 0 0 1 sin θ (e_( hat(alpha)))^(mu)=([(e_( hat(1)))^(1),(e_( hat(1)))^(2)],[(e_( hat(2)))^(1),(e_( hat(2)))^(2)])=([1,0],[0,(1)/(sin theta)])\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\mu}=\left(\begin{array}{ll} \left(\boldsymbol{e}_{\hat{1}}\right)^{1} & \left(\boldsymbol{e}_{\hat{1}}\right)^{2} \\ \left(\boldsymbol{e}_{\hat{2}}\right)^{1} & \left(\boldsymbol{e}_{\hat{2}}\right)^{2} \end{array}\right)=\left(\begin{array}{cc} 1 & 0 \\ 0 & \frac{1}{\sin \theta} \end{array}\right)(eα^)μ=((e1^)1(e1^)2(e2^)1(e2^)2)=(1001sinθ)
Once we have the vielbein we need the general rule 10 10 ^(10){ }^{10}10 that the vielbein components ( e μ ^ ) β e μ ^ β (e_( hat(mu)))^(beta)\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\beta}(eμ^)β remove a down coordinate component β β beta\betaβ and replaces it with an orthonormal component μ ^ μ ^ hat(mu)\hat{\mu}μ^. It also replaces the up component p μ ^ p μ ^ p^( hat(mu))p^{\hat{\mu}}pμ^ with component p β p β p^(beta)p^{\beta}pβ. That is to say
(10.38) ( e μ ^ ) β p β = p μ ^ , ( e μ ^ ) β p μ ^ = p β (10.38) e μ ^ β p β = p μ ^ , e μ ^ β p μ ^ = p β {:(10.38)(e_( hat(mu)))^(beta)p_(beta)=p_( hat(mu))","quad(e_( hat(mu)))^(beta)p^( hat(mu))=p^(beta):}\begin{equation*} \left(\boldsymbol{e}_{\hat{\mu}}\right)^{\beta} p_{\beta}=p_{\hat{\mu}}, \quad\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\beta} p^{\hat{\mu}}=p^{\beta} \tag{10.38} \end{equation*}(10.38)(eμ^)βpβ=pμ^,(eμ^)βpμ^=pβ
We also have
(10.39) ( e μ ) β ^ p β ^ = p μ , ( e μ ) β ^ p μ = p β ^ (10.39) e μ β ^ p β ^ = p μ , e μ β ^ p μ = p β ^ {:(10.39)(e_(mu))^( hat(beta))p_( hat(beta))=p_(mu)","quad(e_(mu))^( hat(beta))p^(mu)=p^( hat(beta)):}\begin{equation*} \left(\boldsymbol{e}_{\mu}\right)^{\hat{\beta}} p_{\hat{\beta}}=p_{\mu}, \quad\left(\boldsymbol{e}_{\mu}\right)^{\hat{\beta}} p^{\mu}=p^{\hat{\beta}} \tag{10.39} \end{equation*}(10.39)(eμ)β^pβ^=pμ,(eμ)β^pμ=pβ^
These equations also apply to tensor components, so we might have, for example
(10.40) T α β = ( e α ) μ ^ ( e β ) ν ^ T μ ^ ν ^ (10.40) T α β = e α μ ^ e β ν ^ T μ ^ ν ^ {:(10.40)T_(alpha beta)=(e_(alpha))^( hat(mu))(e_(beta))^( hat(nu))T_( hat(mu) hat(nu)):}\begin{equation*} T_{\alpha \beta}=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}\left(\boldsymbol{e}_{\beta}\right)^{\hat{\nu}} T_{\hat{\mu} \hat{\nu}} \tag{10.40} \end{equation*}(10.40)Tαβ=(eα)μ^(eβ)ν^Tμ^ν^
To summarize, a vielbein allows us to set up orthonormal frames across all spacetime according to the defining rules
(10.41) g μ ν = η α ^ β ^ ( e μ ) α ^ ( e ν ) β ^ (10.42) η α ^ β ^ = g μ ν ( e α ^ ) μ ( e β ^ ) ν (10.41) g μ ν = η α ^ β ^ e μ α ^ e ν β ^ (10.42) η α ^ β ^ = g μ ν e α ^ μ e β ^ ν {:[(10.41)g_(mu nu)=eta_( hat(alpha) hat(beta))(e_(mu))^( hat(alpha))(e_(nu))^( hat(beta))],[(10.42)eta_( hat(alpha) hat(beta))=g_(mu nu)(e_( hat(alpha)))^(mu)(e_( hat(beta)))^(nu)]:}\begin{align*} & g_{\mu \nu}=\eta_{\hat{\alpha} \hat{\beta}}\left(\boldsymbol{e}_{\mu}\right)^{\hat{\alpha}}\left(\boldsymbol{e}_{\nu}\right)^{\hat{\beta}} \tag{10.41}\\ & \eta_{\hat{\alpha} \hat{\beta}}=g_{\mu \nu}\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\nu} \tag{10.42} \end{align*}(10.41)gμν=ηα^β^(eμ)α^(eν)β^(10.42)ηα^β^=gμν(eα^)μ(eβ^)ν

Example 10.7

The use of vielbein components generalizes the rule that energy measured by an observer with velocity u obs u obs  u_("obs ")\boldsymbol{u}_{\text {obs }}uobs  is given by E = p u obs E = p u obs  E=-p*u_("obs ")E=-\boldsymbol{p} \cdot \boldsymbol{u}_{\text {obs }}E=puobs , where p p p\boldsymbol{p}p is the momentum vector. This is because we always choose e 0 ^ = u obs e 0 ^ = u obs  e_( hat(0))=u_("obs ")\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}_{\text {obs }}e0^=uobs . To prove this we write
(10.43) E = g μ ν p μ ( e 0 ^ ) ν = p ν ( e 0 ^ ) ν (10.43) E = g μ ν p μ e 0 ^ ν = p ν e 0 ^ ν {:(10.43)E=-g_(mu nu)p^(mu)(e_( hat(0)))^(nu)=-p_(nu)(e_( hat(0)))^(nu):}\begin{equation*} E=-g_{\mu \nu} p^{\mu}\left(\boldsymbol{e}_{\hat{0}}\right)^{\nu}=-p_{\nu}\left(e_{\hat{0}}\right)^{\nu} \tag{10.43} \end{equation*}(10.43)E=gμνpμ(e0^)ν=pν(e0^)ν
We know from the definitions of how a vielbein works that p ν ( e 0 ^ ) ν = p 0 ^ p ν e 0 ^ ν = p 0 ^ -p_(nu)(e_( hat(0)))^(nu)=-p_( hat(0))-p_{\nu}\left(\boldsymbol{e}_{\hat{0}}\right)^{\nu}=-p_{\hat{0}}pν(e0^)ν=p0^. Finally, since in the orthonormal frame, indices are manipulated with the Minkowski tensor η η eta\etaη, we have that E = p 0 ^ = η 0 ^ 0 ^ p 0 = p 0 E = p 0 ^ = η 0 ^ 0 ^ p 0 = p 0 E=-p_( hat(0))=-eta_( hat(0) hat(0))p^(0)=p^(0)E=-p_{\hat{0}}=-\eta_{\hat{0} \hat{0}} p^{0}=p^{0}E=p0^=η0^0^p0=p0, as we require.
Vielbein components will be very useful to us. Next, we need to identify some examples of frames in which observers make their measurements.
9 9 ^(9){ }^{9}9 The vielbein components in this case can be written as
( e θ ) θ ^ = 1 , ( e ϕ ) ϕ ^ = sin θ e θ θ ^ = 1 , e ϕ ϕ ^ = sin θ (e_(theta))^( hat(theta))=1,quad(e_(phi))^( hat(phi))=sin theta\left(\boldsymbol{e}_{\theta}\right)^{\hat{\theta}}=1, \quad\left(\boldsymbol{e}_{\phi}\right)^{\hat{\phi}}=\sin \theta(eθ)θ^=1,(eϕ)ϕ^=sinθ
10 10 ^(10){ }^{10}10 Life is made easier in understanding the action of a vielbein if we also emthe action of a vielbein if we also em-
ploy our knowledge of 1-forms. Forploy our knowledge of 1 -forms. For-
mally we define the action of the vielmally we define the action of the viel-
bein on basis vectors and basis 1-forms bein on basis vectors and basis 1 -forms
via
e μ ^ = ( e μ ^ ) α e α , ω μ ^ = ( e α ) μ ^ ω α e α = ( e α ) μ ^ e μ ^ , ω α = ( e μ ^ ) α ω μ ^ . e μ ^ = e μ ^ α e α , ω μ ^ = e α μ ^ ω α e α = e α μ ^ e μ ^ , ω α = e μ ^ α ω μ ^ . {:[e_( hat(mu))=(e_( hat(mu)))^(alpha)e_(alpha)",",omega^( hat(mu))=(e_(alpha))^( hat(mu))omega^(alpha)],[e_(alpha)=(e_(alpha))^( hat(mu))e_( hat(mu))",",omega^(alpha)=(e_( hat(mu)))^(alpha)omega^( hat(mu)).]:}\begin{array}{ll} \boldsymbol{e}_{\hat{\mu}}=\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\alpha} \boldsymbol{e}_{\alpha}, & \boldsymbol{\omega}^{\hat{\mu}}=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}} \boldsymbol{\omega}^{\alpha} \\ \boldsymbol{e}_{\alpha}=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}} \boldsymbol{e}_{\hat{\mu}}, & \boldsymbol{\omega}^{\alpha}=\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\alpha} \boldsymbol{\omega}^{\hat{\mu}} . \end{array}eμ^=(eμ^)αeα,ωμ^=(eα)μ^ωαeα=(eα)μ^eμ^,ωα=(eμ^)αωμ^.
( 10.33 ) ( 10.33 ) (10.33)(10.33)(10.33)
From which the rules in the text can be confirmed. Equivalently, we can return to the definition
(10.34) v = v μ e μ = v ν ^ e ν ^ (10.34) v = v μ e μ = v ν ^ e ν ^ {:(10.34)v=v^(mu)e_(mu)=v^( hat(nu))e_( hat(nu)):}\begin{equation*} \boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}=v^{\hat{\nu}} \boldsymbol{e}_{\hat{\nu}} \tag{10.34} \end{equation*}(10.34)v=vμeμ=vν^eν^
and use the inner product ω α , e μ = ω α , e μ = (:omega^(alpha),e_(mu):)=\left\langle\boldsymbol{\omega}^{\alpha}, \boldsymbol{e}_{\mu}\right\rangle=ωα,eμ= δ α μ δ α μ delta^(alpha)_(mu)\delta^{\alpha}{ }_{\mu}δαμ. To remove a vector like e μ e μ e_(mu)\boldsymbol{e}_{\mu}eμ, from the second term in eqn 10.34 , we note that, on taking an inner product with basis 1-form ω μ ω μ omega^(mu)\boldsymbol{\omega}^{\mu}ωμ, we have
v μ = v ν ^ ω μ , e ν ^ = v ν ^ ( e ν ^ ) μ , (10.35) v μ = v ν ^ ω μ , e ν ^ = v ν ^ e ν ^ μ ,  (10.35)  v^(mu)=v^( hat(nu))(:omega^(mu),e_( hat(nu)):)=v^( hat(nu))(e_( hat(nu)))^(mu),quad" (10.35) "v^{\mu}=v^{\hat{\nu}}\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\hat{\nu}}\right\rangle=v^{\hat{\nu}}\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\mu}, \quad \text { (10.35) }vμ=vν^ωμ,eν^=vν^(eν^)μ, (10.35) 
where we've fixed the vielbein components by
ω μ , e ν ^ = ω μ , e α ( e ν ^ ) α = ( e ν ^ ) μ ω μ , e ν ^ = ω μ , e α e ν ^ α = e ν ^ μ (:omega^(mu),e_( hat(nu)):)=(:omega^(mu),e_(alpha):)(e_( hat(nu)))^(alpha)=(e_( hat(nu)))^(mu)\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\hat{\nu}}\right\rangle=\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\alpha}\right\rangle\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\mu}ωμ,eν^=ωμ,eα(eν^)α=(eν^)μ. (10.36) Note here that ( e ν ^ ) μ = ω μ , e ν ^ e ν ^ μ = ω μ , e ν ^ (e_( hat(nu)))^(mu)=(:omega^(mu),e_( hat(nu)):)!=\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\mu}=\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\hat{\nu}}\right\rangle \neq(eν^)μ=ωμ,eν^ δ μ ν ^ δ μ ν ^ delta^(mu)_( hat(nu))\delta^{\mu}{ }_{\hat{\nu}}δμν^, since here we're working with the basis vectors of two different coordinate systems. The same method also yields
v μ ^ = v ν ω μ ^ , e ν = v ν ( e ν ) μ ^ . v μ ^ = v ν ω μ ^ , e ν = v ν e ν μ ^ . v^( hat(mu))=v^(nu)(:omega^( hat(mu)),e_(nu):)=v^(nu)(e_(nu))^( hat(mu)).v^{\hat{\mu}}=v^{\nu}\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\nu}\right\rangle=v^{\nu}\left(\boldsymbol{e}_{\nu}\right)^{\hat{\mu}} .vμ^=vνωμ^,eν=vν(eν)μ^.
The other relationships can be confirmed using the idea that a 1 -form can be written as u = u μ ω μ = u μ ^ ω μ ^ u ¯ = u μ ω μ = u μ ^ ω μ ^ bar(u)=u_(mu)omega^(mu)=u_( hat(mu))omega^( hat(mu))\overline{\boldsymbol{u}}=u_{\mu} \boldsymbol{\omega}^{\mu}=u_{\hat{\mu}} \boldsymbol{\omega}^{\hat{\mu}}u=uμωμ=uμ^ωμ^.

10.3 The orthonormal frame

In many of the cases we'll consider later in the book,we shall find that a very convenient orthonormal frame in which to carry out computations, particularly of curvature,is one where the observer is at rest relative to the coordinate frame.That is,the velocity expressed in the coordinate frame is u μ = d x μ / d τ = ( u 0 , 0 , 0 , 0 ) u μ = d x μ / d τ = u 0 , 0 , 0 , 0 u^(mu)=dx^(mu)//dtau=(u^(0),0,0,0)u^{\mu}=\mathrm{d} x^{\mu} / \mathrm{d} \tau=\left(u^{0}, 0,0,0\right)uμ=dxμ/dτ=(u0,0,0,0) ,with u 0 u 0 u^(0)u^{0}u0 fixed such that g μ ν u μ u ν = g μ ν u μ u ν = g_(mu nu)u^(mu)u^(nu)=g_{\mu \nu} u^{\mu} u^{\nu}=gμνuμuν= g 00 ( u 0 ) 2 = 1 g 00 u 0 2 = 1 g_(00)(u^(0))^(2)=-1g_{00}\left(u^{0}\right)^{2}=-1g00(u0)2=1 .This observer then chooses e t ^ = u e t ^ = u e_( hat(t))=u\boldsymbol{e}_{\hat{t}}=\boldsymbol{u}et^=u .We will call this rather natural choice 11 11 ^(11){ }^{11}11 the stationary orthonormal frame.In such a frame,we have a metric that looks locally like the Minkowski metric, but there is no reason to believe that the connection coefficients should vanish. 12 12 ^(12){ }^{12}12
To find the orthonormal frame we effectively diagonalize the metric and normalize the components.That is,we are trying to solve
(10.44) g ( e α ^ , e β ^ ) = η α ^ β ^ (10.44) g e α ^ , e β ^ = η α ^ β ^ {:(10.44)g(e_( hat(alpha)),e_( hat(beta)))=eta_( hat(alpha) hat(beta)):}\begin{equation*} \boldsymbol{g}\left(\boldsymbol{e}_{\hat{\alpha}}, \boldsymbol{e}_{\hat{\beta}}\right)=\eta_{\hat{\alpha} \hat{\beta}} \tag{10.44} \end{equation*}(10.44)g(eα^,eβ^)=ηα^β^
which is equivalent to the component equation
(10.45) g μ ν ( e α ^ ) μ ( e β ^ ) ν = η α ^ β ^ . (10.45) g μ ν e α ^ μ e β ^ ν = η α ^ β ^ . {:(10.45)g_(mu nu)(e_( hat(alpha)))^(mu)(e_( hat(beta)))^(nu)=eta_( hat(alpha) hat(beta)).:}\begin{equation*} g_{\mu \nu}\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\nu}=\eta_{\hat{\alpha} \hat{\beta}} . \tag{10.45} \end{equation*}(10.45)gμν(eα^)μ(eβ^)ν=ηα^β^.
Example 10.8
The matrix form for(1+1)-dimensional spacetime is
(10.46) ( ( e 0 ^ ) 0 ( e 0 ^ ) 1 ( e 1 ^ ) 0 ( e 1 ^ ) 1 ) ( g 00 g 01 g 10 g 11 ) ( ( e 0 ^ ) 0 ( e 1 ^ ) 0 ( e 0 ^ ) 1 ( e 1 ^ ) 1 ) = ( 1 0 0 1 ) . (10.46) e 0 ^ 0 e 0 ^ 1 e 1 ^ 0 e 1 ^ 1 g 00 g 01 g 10 g 11 e 0 ^ 0 e 1 ^ 0 e 0 ^ 1 e 1 ^ 1 = 1 0 0 1 . {:(10.46)([(e_( hat(0)))^(0),(e_( hat(0)))^(1)],[(e_( hat(1)))^(0),(e_( hat(1)))^(1)])([g_(00),g_(01)],[g_(10),g_(11)])([(e_( hat(0)))^(0),(e_( hat(1)))^(0)],[(e_( hat(0)))^(1),(e_( hat(1)))^(1)])=([-1,0],[0,1]).:}\left(\begin{array}{ll} \left(\boldsymbol{e}_{\hat{0}}\right)^{0} & \left(\boldsymbol{e}_{\hat{0}}\right)^{1} \tag{10.46}\\ \left(\boldsymbol{e}_{\hat{1}}\right)^{0} & \left(\boldsymbol{e}_{\hat{1}}\right)^{1} \end{array}\right)\left(\begin{array}{ll} g_{00} & g_{01} \\ g_{10} & g_{11} \end{array}\right)\left(\begin{array}{ll} \left(\boldsymbol{e}_{\hat{0}}\right)^{0} & \left(\boldsymbol{e}_{\hat{1}}\right)^{0} \\ \left(\boldsymbol{e}_{\hat{0}}\right)^{1} & \left(\boldsymbol{e}_{\hat{1}}\right)^{1} \end{array}\right)=\left(\begin{array}{cc} -1 & 0 \\ 0 & 1 \end{array}\right) .(10.46)((e0^)0(e0^)1(e1^)0(e1^)1)(g00g01g10g11)((e0^)0(e1^)0(e0^)1(e1^)1)=(1001).
In the most commonly encountered case of a metric that is already diagonal,we can simply normalize the components as discussed in the next example.
Example 10.9
For an observer at rest u u u\boldsymbol{u}u has a single non-zero component u 0 u 0 u^(0)u^{0}u0 .Its value is given via the normalization of the velocity,by
(10.47) g 00 ( u 0 ) 2 = 1 (10.47) g 00 u 0 2 = 1 {:(10.47)g_(00)(u^(0))^(2)=-1:}\begin{equation*} g_{00}\left(u^{0}\right)^{2}=-1 \tag{10.47} \end{equation*}(10.47)g00(u0)2=1
so we must have u 0 = ( g 00 ) 1 / 2 u 0 = g 00 1 / 2 u^(0)=(-g_(00))^(-1//2)u^{0}=\left(-g_{00}\right)^{-1 / 2}u0=(g00)1/2 .Our rule e 0 ^ = u e 0 ^ = u e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}e0^=u then mandates e 0 ^ = e 0 / g 00 e 0 ^ = e 0 / g 00 e_( hat(0))=e_(0)//sqrt(-g_(00))\boldsymbol{e}_{\hat{0}}=\boldsymbol{e}_{0} / \sqrt{-g_{00}}e0^=e0/g00 . For the diagonal metric we can then pick out an orthonormal basis
(10.48) e 0 ^ = 1 g 00 e 0 , e i ^ = 1 i x e i , (10.48) e 0 ^ = 1 g 00 e 0 , e i ^ = 1 i x e i , {:(10.48)e_( hat(0))=(1)/(sqrt(-g_(00)))e_(0)","quade_( hat(i))=(1)/(sqrt(⿹勹口⿱⿰㇒一十凵ix))e_(i)",":}\begin{equation*} \boldsymbol{e}_{\hat{0}}=\frac{1}{\sqrt{-g_{00}}} e_{0}, \quad \boldsymbol{e}_{\hat{i}}=\frac{1}{\sqrt{⿹ 勹 口 ⿱ ⿰ ㇒ 一 十 凵 i x}} \boldsymbol{e}_{i}, \tag{10.48} \end{equation*}(10.48)e0^=1g00e0,ei^=1ixei,
with components in the coordinate frame of
(10.49) ( e 0 ^ ) 0 = 1 g 00 , ( e i ) i = 1 g i i . (10.49) e 0 ^ 0 = 1 g 00 , e i i = 1 g i i . {:(10.49)(e_( hat(0)))^(0)=(1)/(sqrt(-g_(00)))","quad(e_(i))^(i)=(1)/(sqrt(g_(ii))).:}\begin{equation*} \left(e_{\hat{0}}\right)^{0}=\frac{1}{\sqrt{-g_{00}}}, \quad\left(e_{i}\right)^{i}=\frac{1}{\sqrt{g_{i i}}} . \tag{10.49} \end{equation*}(10.49)(e0^)0=1g00,(ei)i=1gii.
We can check this works by considering the rule g μ ν ( e α ^ ) μ ( e β ^ ) ν = η α ^ β ^ g μ ν e α ^ μ e β ^ ν = η α ^ β ^ g_(mu nu)(e_( hat(alpha)))^(mu)(e_( hat(beta)))^(nu)=eta_( hat(alpha) hat(beta))g_{\mu \nu}\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\nu}=\eta_{\hat{\alpha} \hat{\beta}}gμν(eα^)μ(eβ^)ν=ηα^β^ .If the metric g g g\boldsymbol{g}g is diagonal then,by inspection,we have
(10.50) g μ μ ( e 0 ^ ) μ ( e 0 ^ ) μ = 1 , g μ μ ( e i ^ ) μ ( e i ^ ) μ = 1 . (10.50) g μ μ e 0 ^ μ e 0 ^ μ = 1 , g μ μ e i ^ μ e i ^ μ = 1 . {:(10.50){:[g_(mu mu)(e_( hat(0)))^(mu)(e_( hat(0)))^(mu)=-1","],[g_(mu mu)(e_( hat(i)))^(mu)(e_( hat(i)))^(mu)=1.]:}:}\begin{array}{r} g_{\mu \mu}\left(\boldsymbol{e}_{\hat{0}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{0}}\right)^{\mu}=-1, \\ g_{\mu \mu}\left(\boldsymbol{e}_{\hat{i}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{i}}\right)^{\mu}=1 . \tag{10.50} \end{array}(10.50)gμμ(e0^)μ(e0^)μ=1,gμμ(ei^)μ(ei^)μ=1.
no summation is implied.
The normalization procedure in the last example makes identifying the vielbein components for the orthonormal frame trivial for the diagonal metric. We simply normalize by writing
(10.51) ( e μ ^ ) μ = 1 / | g μ μ | , ( e μ ) μ ^ = | g μ μ | (10.51) e μ ^ μ = 1 / g μ μ , e μ μ ^ = g μ μ {:(10.51)(e_( hat(mu)))^(mu)=1//sqrt(|g_(mu mu)|)","quad(e_(mu))^( hat(mu))=sqrt(|g_(mu mu)|):}\begin{equation*} \left(\boldsymbol{e}_{\hat{\mu}}\right)^{\mu}=1 / \sqrt{\left|g_{\mu \mu}\right|}, \quad\left(e_{\mu}\right)^{\hat{\mu}}=\sqrt{\left|g_{\mu \mu}\right|} \tag{10.51} \end{equation*}(10.51)(eμ^)μ=1/|gμμ|,(eμ)μ^=|gμμ|
In this way, we can think of the vielbein components as the square roots of the metric components.
Example 10.10
As we shall see later, a spherically symmetric gravitating object of mass M M MMM gives rise to the Schwarzschild metric with line element given by 14 14 ^(14){ }^{14}14
(10.52) d s 2 = ( 1 2 M r ) d t 2 + ( 1 2 M r ) 1 d r 2 + r 2 ( d θ 2 + sin 2 θ d ϕ 2 ) (10.52) d s 2 = 1 2 M r d t 2 + 1 2 M r 1 d r 2 + r 2 d θ 2 + sin 2 θ d ϕ 2 {:(10.52)ds^(2)=-(1-(2M)/(r))dt^(2)+(1-(2M)/(r))^(-1)dr^(2)+r^(2)((d)theta^(2)+sin^(2)theta(d)phi^(2)):}\begin{equation*} \mathrm{d} s^{2}=-\left(1-\frac{2 M}{r}\right) \mathrm{d} t^{2}+\left(1-\frac{2 M}{r}\right)^{-1} \mathrm{~d} r^{2}+r^{2}\left(\mathrm{~d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right) \tag{10.52} \end{equation*}(10.52)ds2=(12Mr)dt2+(12Mr)1 dr2+r2( dθ2+sin2θ dϕ2)
We can identify an orthonormal frame in this so-called Schwarzschild geometry, which has coordinates ordered ( t , r , θ , ϕ ) ( t , r , θ , ϕ ) (t,r,theta,phi)(t, r, \theta, \phi)(t,r,θ,ϕ). An observer at rest in this geometry has a velocity vector u u u\boldsymbol{u}u with components
(10.53) u μ = [ ( 1 2 M r ) 1 2 , 0 , 0 , 0 ] (10.53) u μ = 1 2 M r 1 2 , 0 , 0 , 0 {:(10.53)u^(mu)=[(1-(2M)/(r))^(-(1)/(2)),0,0,0]:}\begin{equation*} u^{\mu}=\left[\left(1-\frac{2 M}{r}\right)^{-\frac{1}{2}}, 0,0,0\right] \tag{10.53} \end{equation*}(10.53)uμ=[(12Mr)12,0,0,0]
so that u u = g t t ( 1 2 M r ) 1 = 1 u u = g t t 1 2 M r 1 = 1 u*u=g_(tt)(1-(2M)/(r))^(-1)=-1\boldsymbol{u} \cdot \boldsymbol{u}=g_{t t}\left(1-\frac{2 M}{r}\right)^{-1}=-1uu=gtt(12Mr)1=1, as it must. We set e i = u e i = u e_(i)=u\boldsymbol{e}_{i}=\boldsymbol{u}ei=u. We then choose
(10.54) e t ^ = ( 1 2 M r ) 1 2 e t , e r ^ = ( 1 2 M r ) 1 2 e r , e θ ^ = 1 r e θ , e ϕ ^ = 1 r sin θ e ϕ (10.54) e t ^ = 1 2 M r 1 2 e t , e r ^ = 1 2 M r 1 2 e r , e θ ^ = 1 r e θ , e ϕ ^ = 1 r sin θ e ϕ {:(10.54)e_( hat(t))=(1-(2M)/(r))^(-(1)/(2))e_(t)","quade_( hat(r))=(1-(2M)/(r))^((1)/(2))e_(r)","quade_( hat(theta))=(1)/(r)e_(theta)","quade_( hat(phi))=(1)/(r sin theta)e_(phi):}\begin{equation*} \boldsymbol{e}_{\hat{t}}=\left(1-\frac{2 M}{r}\right)^{-\frac{1}{2}} \boldsymbol{e}_{t}, \quad \boldsymbol{e}_{\hat{r}}=\left(1-\frac{2 M}{r}\right)^{\frac{1}{2}} \boldsymbol{e}_{r}, \quad \boldsymbol{e}_{\hat{\theta}}=\frac{1}{r} \boldsymbol{e}_{\theta}, \quad \boldsymbol{e}_{\hat{\phi}}=\frac{1}{r \sin \theta} \boldsymbol{e}_{\phi} \tag{10.54} \end{equation*}(10.54)et^=(12Mr)12et,er^=(12Mr)12er,eθ^=1reθ,eϕ^=1rsinθeϕ
It's not hard to see that these must obey the defining rules above. Alternatively, writing non-zero components of the vielbein explicitly 15 15 ^(15){ }^{15}15
( e t ^ ) t = ( 1 2 M r ) 1 2 , ( e r ^ ) r = ( 1 2 M r ) 1 2 , ( e θ ^ ) θ = 1 r , ( e ϕ ^ ) ϕ = 1 r sin θ . e t ^ t = 1 2 M r 1 2 , e r ^ r = 1 2 M r 1 2 , e θ ^ θ = 1 r , e ϕ ^ ϕ = 1 r sin θ . (e_( hat(t)))^(t)=(1-(2M)/(r))^(-(1)/(2)),quad(e_( hat(r)))^(r)=(1-(2M)/(r))^((1)/(2)),quad(e_( hat(theta)))^(theta)=(1)/(r),quad(e_( hat(phi)))^(phi)=(1)/(r sin theta).\left(e_{\hat{t}}\right)^{t}=\left(1-\frac{2 M}{r}\right)^{-\frac{1}{2}}, \quad\left(\boldsymbol{e}_{\hat{r}}\right)^{r}=\left(1-\frac{2 M}{r}\right)^{\frac{1}{2}}, \quad\left(e_{\hat{\theta}}\right)^{\theta}=\frac{1}{r}, \quad\left(\boldsymbol{e}_{\hat{\phi}}\right)^{\phi}=\frac{1}{r \sin \theta} .(et^)t=(12Mr)12,(er^)r=(12Mr)12,(eθ^)θ=1r,(eϕ^)ϕ=1rsinθ.
Of course the point of identifying vielbein components is to use them, in ways such as that shown in the next example.

Example 10.11

Spacetime is described by a metric line element 16 16 ^(16){ }^{16}16
(10.56) d s 2 = d t 2 + d χ 2 + χ 2 ( d θ 2 + sin 2 θ d ϕ 2 ) . (10.56) d s 2 = d t 2 + d χ 2 + χ 2 d θ 2 + sin 2 θ d ϕ 2 . {:(10.56)ds^(2)=-dt^(2)+dchi^(2)+chi^(2)((d)theta^(2)+sin^(2)theta(d)phi^(2)).:}\begin{equation*} \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} \chi^{2}+\chi^{2}\left(\mathrm{~d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}\right) . \tag{10.56} \end{equation*}(10.56)ds2=dt2+dχ2+χ2( dθ2+sin2θ dϕ2).
In the coordinate frame, the energy-momentum tensor for the fluid has components
(10.57) T t t = ρ , T χ χ = p , T θ θ = p χ 2 , T ϕ ϕ = p χ 2 sin 2 θ , (10.57) T t t = ρ , T χ χ = p , T θ θ = p χ 2 , T ϕ ϕ = p χ 2 sin 2 θ , {:(10.57)T_(tt)=rho","quadT_(chi chi)=p","quadT_(theta theta)=pchi^(2)","quadT_(phi phi)=pchi^(2)sin^(2)theta",":}\begin{equation*} T_{t t}=\rho, \quad T_{\chi \chi}=p, \quad T_{\theta \theta}=p \chi^{2}, \quad T_{\phi \phi}=p \chi^{2} \sin ^{2} \theta, \tag{10.57} \end{equation*}(10.57)Ttt=ρ,Tχχ=p,Tθθ=pχ2,Tϕϕ=pχ2sin2θ,
where ρ ρ rho\rhoρ is an energy density and p p ppp is a pressure. We can express these in the orthonormal frame. So, for example, T θ θ T θ θ T_(theta theta)T_{\theta \theta}Tθθ becomes
(10.58) T θ ^ θ ^ = ( e θ ^ ) θ ( e θ ^ ) θ T θ θ = p (10.58) T θ ^ θ ^ = e θ ^ θ e θ ^ θ T θ θ = p {:(10.58)T_( hat(theta) hat(theta))=(e_( hat(theta)))^(theta)(e_( hat(theta)))^(theta)T_(theta theta)=p:}\begin{equation*} T_{\hat{\theta} \hat{\theta}}=\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta} T_{\theta \theta}=p \tag{10.58} \end{equation*}(10.58)Tθ^θ^=(eθ^)θ(eθ^)θTθθ=p
We find that in the orthonormal frame, we have
(10.59) T t ^ t ^ = ρ , T χ ^ χ ^ = p , T θ ^ θ ^ = p , T ϕ ^ ϕ ^ = p (10.59) T t ^ t ^ = ρ , T χ ^ χ ^ = p , T θ ^ θ ^ = p , T ϕ ^ ϕ ^ = p {:(10.59)T_( hat(t) hat(t))=rho","quadT_( hat(chi) hat(chi))=p","quadT_( hat(theta) hat(theta))=p","quadT_( hat(phi) hat(phi))=p:}\begin{equation*} T_{\hat{t} \hat{t}}=\rho, \quad T_{\hat{\chi} \hat{\chi}}=p, \quad T_{\hat{\theta} \hat{\theta}}=p, \quad T_{\hat{\phi} \hat{\phi}}=p \tag{10.59} \end{equation*}(10.59)Tt^t^=ρ,Tχ^χ^=p,Tθ^θ^=p,Tϕ^ϕ^=p
which is a useful simplification.

10.4 Freely falling frames

There are several possible orthonormal frames that can be identified. In Chapter 6, we discussed the possibility of finding locally inertial frames (LIFs) in which, in addition to the metric being identical to the Minkowski metric, the frame also has the property that the first derivatives of the components g μ ν / x α g μ ν / x α delg_(mu nu)//delx^(alpha)\partial g_{\mu \nu} / \partial x^{\alpha}gμν/xα vanish at the point considered. This implies that the connection coefficients Γ μ α β Γ μ α β Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}Γμαβ also vanish at that point.
There are a few methods for identifying LIFs, but one of the most useful is the freely falling frame. Recall that a body that is freely falling follows a (timelike) geodesic curve in spacetime. The equivalence principle tells us that a sufficiently small laboratory in free fall should not be able to detect any gravitation. As a result of this, we might expect the laboratory's coordinate system has vanishing connection coefficients. This is indeed the case.
Freely falling frames are therefore defined to possess a system of coordinates in which the connection coefficients vanish along the geodesic that describes their free fall. The frame is described via a set of orthonormal basis vectors e α ^ ( τ ) e α ^ ( τ ) e_( hat(alpha))(tau)\boldsymbol{e}_{\hat{\alpha}}(\tau)eα^(τ) that we should determine in order to be able to understand the results of measurements.
Consider the geodesic of the falling observer with proper time τ τ tau\tauτ, which is the curve x μ ( τ ) x μ ( τ ) x^(mu)(tau)x^{\mu}(\tau)xμ(τ). The observer's four velocity is u ( τ ) = d x μ d τ e μ u ( τ ) = d x μ d τ e μ u(tau)=(dx^(mu))/(dtau)*e_(mu)\boldsymbol{u}(\tau)=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \cdot \boldsymbol{e}_{\mu}u(τ)=dxμdτeμ. This vector is identified with the zeroth basis vector e 0 ^ ( τ ) = u ( τ ) e 0 ^ ( τ ) = u ( τ ) e_( hat(0))(tau)=u(tau)\boldsymbol{e}_{\hat{0}}(\tau)=\boldsymbol{u}(\tau)e0^(τ)=u(τ). We can find the spatial basis vectors at some point along the geodesic by identifying a set of orthonormal vectors that are perpendicular to u u u\boldsymbol{u}u. The basis vectors at other points could be found by parallel transporting the basis vectors along the geodesic. Since the connection coefficients vanish, the freely falling frame is then defined by 17 17 ^(17){ }^{17}17
(10.60) u e α ^ = 0 (10.60) u e α ^ = 0 {:(10.60)grad_(u)e_( hat(alpha))=0:}\begin{equation*} \nabla_{u} e_{\hat{\alpha}}=0 \tag{10.60} \end{equation*}(10.60)ueα^=0
for all α α alpha\alphaα. Notice that this is automatically satisfied for e 0 ^ = u e 0 ^ = u e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}e0^=u by definition for a geodesic.

Example 10.12

We shall see in Chapter 22 that falling radially inwards from rest at infinity in the Schwarzschild geometry, a particle has velocity u u u\boldsymbol{u}u with components in the coordinate frame of
(10.61) u α = [ ( 1 2 M r ) 1 , ( 2 M r ) 1 2 , 0 , 0 ] (10.61) u α = 1 2 M r 1 , 2 M r 1 2 , 0 , 0 {:(10.61)u^(alpha)=[(1-(2M)/(r))^(-1),-((2M)/(r))^((1)/(2)),0,0]:}\begin{equation*} u^{\alpha}=\left[\left(1-\frac{2 M}{r}\right)^{-1},-\left(\frac{2 M}{r}\right)^{\frac{1}{2}}, 0,0\right] \tag{10.61} \end{equation*}(10.61)uα=[(12Mr)1,(2Mr)12,0,0]
We write down that e t ^ = u ( τ ) e t ^ = u ( τ ) e_( hat(t))=u(tau)\boldsymbol{e}_{\hat{t}}=\boldsymbol{u}(\tau)et^=u(τ). As far as it's possible to identify them, diagonal vielbein components are most simple to use. So, as in the previous examples, we write ( e θ ^ ) θ = 1 / r e θ ^ θ = 1 / r (e_( hat(theta)))^(theta)=1//r\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}=1 / r(eθ^)θ=1/r and ( e ϕ ^ ) ϕ = 1 / ( r sin θ ) e ϕ ^ ϕ = 1 / ( r sin θ ) (e_( hat(phi)))^(phi)=1//(r sin theta)\left(\boldsymbol{e}_{\hat{\phi}}\right)^{\phi}=1 /(r \sin \theta)(eϕ^)ϕ=1/(rsinθ). For ( e r ^ ) α e r ^ α (e_( hat(r)))^(alpha)\left(\boldsymbol{e}_{\hat{r}}\right)^{\alpha}(er^)α we have g μ ν ( e r ^ ) μ ( e r ^ ) ν = η r ^ r ^ g μ ν e r ^ μ e r ^ ν = η r ^ r ^ g_(mu nu)(e_( hat(r)))^(mu)(e_( hat(r)))^(nu)=eta_( hat(r) hat(r))g_{\mu \nu}\left(\boldsymbol{e}_{\hat{r}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{r}}\right)^{\nu}=\eta_{\hat{r} \hat{r}}gμν(er^)μ(er^)ν=ηr^r^ or
( 1 2 M r ) ( e r ^ ) t ( e r ^ ) t + ( 1 2 M r ) 1 ( e r ^ ) r ( e r ^ ) r = 1 1 2 M r e r ^ t e r ^ t + 1 2 M r 1 e r ^ r e r ^ r = 1 -(1-(2M)/(r))(e_( hat(r)))^(t)(e_( hat(r)))^(t)+(1-(2M)/(r))^(-1)(e_( hat(r)))^(r)(e_( hat(r)))^(r)=1-\left(1-\frac{2 M}{r}\right)\left(\boldsymbol{e}_{\hat{r}}\right)^{t}\left(\boldsymbol{e}_{\hat{r}}\right)^{t}+\left(1-\frac{2 M}{r}\right)^{-1}\left(\boldsymbol{e}_{\hat{r}}\right)^{r}\left(\boldsymbol{e}_{\hat{r}}\right)^{r}=1(12Mr)(er^)t(er^)t+(12Mr)1(er^)r(er^)r=1
Choose ( e r ^ ) r = 1 e r ^ r = 1 (e_( hat(r)))^(r)=1\left(e_{\hat{r}}\right)^{r}=1(er^)r=1, and conclude that the result is
( e t ^ ) α = ( e 0 ^ ) α = ( ( 1 2 M / r ) 1 , ( 2 M / r ) 1 2 , 0 , 0 ) ( e r ^ ) α = ( e 1 ^ ) α = ( ( 2 M / r ) 1 2 ( 1 2 M / r ) 1 , 1 , 0 , 0 ) , ( e θ ^ ) α = ( e 2 ^ ) α = ( 0 , 0 , 1 / r , 0 ) ( e ϕ ^ ) α = ( e 3 ^ ) α = ( 0 , 0 , 0 , ( r sin θ ) 1 ) e t ^ α = e 0 ^ α = ( 1 2 M / r ) 1 , ( 2 M / r ) 1 2 , 0 , 0 e r ^ α = e 1 ^ α = ( 2 M / r ) 1 2 ( 1 2 M / r ) 1 , 1 , 0 , 0 , e θ ^ α = e 2 ^ α = ( 0 , 0 , 1 / r , 0 ) e ϕ ^ α = e 3 ^ α = 0 , 0 , 0 , ( r sin θ ) 1 {:[(e_( hat(t)))^(alpha)=(e_( hat(0)))^(alpha)=((1-2M//r)^(-1),-(2M//r)^((1)/(2)),0,0)],[(e_( hat(r)))^(alpha)=(e_( hat(1)))^(alpha)=(-(2M//r)^((1)/(2))(1-2M//r)^(-1),1,0,0)","],[(e_( hat(theta)))^(alpha)=(e_( hat(2)))^(alpha)=(0","0","1//r","0)],[(e_( hat(phi)))^(alpha)=(e_( hat(3)))^(alpha)=(0,0,0,(r sin theta)^(-1))]:}\begin{aligned} & \left(\boldsymbol{e}_{\hat{t}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{0}}\right)^{\alpha}=\left((1-2 M / r)^{-1},-(2 M / r)^{\frac{1}{2}}, 0,0\right) \\ & \left(\boldsymbol{e}_{\hat{r}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{1}}\right)^{\alpha}=\left(-(2 M / r)^{\frac{1}{2}}(1-2 M / r)^{-1}, 1,0,0\right), \\ & \left(\boldsymbol{e}_{\hat{\theta}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{2}}\right)^{\alpha}=(0,0,1 / r, 0) \\ & \left(\boldsymbol{e}_{\hat{\phi}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{3}}\right)^{\alpha}=\left(0,0,0,(r \sin \theta)^{-1}\right) \end{aligned}(et^)α=(e0^)α=((12M/r)1,(2M/r)12,0,0)(er^)α=(e1^)α=((2M/r)12(12M/r)1,1,0,0),(eθ^)α=(e2^)α=(0,0,1/r,0)(eϕ^)α=(e3^)α=(0,0,0,(rsinθ)1)
We saw in Chapter 7, Example 7.10, that for a system in which the connection coefficients vanish, we should have D χ / d τ = d χ / d τ D χ / d τ = d χ / d τ Dchi//dtau=dchi//dtau\mathrm{D} \boldsymbol{\chi} / \mathrm{d} \tau=\mathrm{d} \boldsymbol{\chi} / \mathrm{d} \tauDχ/dτ=dχ/dτ. We should check that χ χ chi\chiχ in a freely falling frame (defined by u e α ^ = 0 u e α ^ = 0 grad_(u)e_( hat(alpha))=0\nabla_{u} e_{\hat{\alpha}}=0ueα^=0 ) has this property.

Example 10.13

Note that in the freely falling frame we have the defining fact that the basis vector and basis 1-forms are parallel transported.
(10.64) u e α ^ = 0 , u ω α ^ = 0 (10.64) u e α ^ = 0 , u ω α ^ = 0 {:(10.64)grad_(u)e_( hat(alpha))=0","quadgrad_(u)omega^( hat(alpha))=0:}\begin{equation*} \nabla_{u} \boldsymbol{e}_{\hat{\alpha}}=0, \quad \nabla_{u} \boldsymbol{\omega}^{\hat{\alpha}}=0 \tag{10.64} \end{equation*}(10.64)ueα^=0,uωα^=0
Our first task is to express the covariant derivative vector u χ u χ grad_(u)chi\nabla_{u} \boldsymbol{\chi}uχ in the freely falling frame. To do this we use a vielbein ( e α ) μ ^ e α μ ^ (e_(alpha))^( hat(mu))\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}(eα)μ^ to bring its components into the orthonormal freely falling system. The components are ( u χ ) μ ^ = ( e α ) μ ^ ( u χ ) α u χ μ ^ = e α μ ^ u χ α (grad_(u)chi)^( hat(mu))=(e_(alpha))^( hat(mu))(grad_(u)chi)^(alpha)\left(\nabla_{u} \boldsymbol{\chi}\right)^{\hat{\mu}}=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}\left(\nabla_{u} \boldsymbol{\chi}\right)^{\alpha}(uχ)μ^=(eα)μ^(uχ)α, which can be rewritten as 18 18 ^(18){ }^{18}18
But this is just the covariant derivative of the μ ^ μ ^ hat(mu)\hat{\mu}μ^ component of χ χ chi\boldsymbol{\chi}χ, or u ( χ μ ^ ) u χ μ ^ grad_(u)(chi^( hat(mu)))\boldsymbol{\nabla}_{u}\left(\chi^{\hat{\mu}}\right)u(χμ^). The covariant derivative of a component of a vector is the same as the covariant derivative of some scalar function, which (as we saw in the earlier sidenote) is just the directional derivative, which is to say
(10.69) u ( x μ ^ ) = u β x β χ μ ^ (10.70) = d x β d τ x β χ μ ^ = d χ μ ^ d τ (10.69) u x μ ^ = u β x β χ μ ^ (10.70) = d x β d τ x β χ μ ^ = d χ μ ^ d τ {:[(10.69)grad_(u)(x^( hat(mu)))=u^(beta)(del)/(delx^(beta))chi^( hat(mu))],[(10.70)=(dx^(beta))/(dtau)*(del)/(delx^(beta))chi^( hat(mu))=(dchi^( hat(mu)))/(dtau)]:}\begin{align*} \nabla_{u}\left(x^{\hat{\mu}}\right) & =u^{\beta} \frac{\partial}{\partial x^{\beta}} \chi^{\hat{\mu}} \tag{10.69}\\ & =\frac{\mathrm{d} x^{\beta}}{\mathrm{d} \tau} \cdot \frac{\partial}{\partial x^{\beta}} \chi^{\hat{\mu}}=\frac{\mathrm{d} \chi^{\hat{\mu}}}{\mathrm{d} \tau} \tag{10.70} \end{align*}(10.69)u(xμ^)=uβxβχμ^(10.70)=dxβdτxβχμ^=dχμ^dτ
where τ τ tau\tauτ is the proper time. Putting things together, we conclude
(10.71) ( D χ d τ ) μ ^ ( u χ ) μ ^ = d χ μ ^ d τ (10.71) D χ d τ μ ^ u χ μ ^ = d χ μ ^ d τ {:(10.71)((Dchi)/((d)tau))^( hat(mu))-=(grad_(u)chi)^( hat(mu))=(dchi^( hat(mu)))/(dtau):}\begin{equation*} \left(\frac{\mathrm{D} \boldsymbol{\chi}}{\mathrm{~d} \tau}\right)^{\hat{\mu}} \equiv\left(\nabla_{u} \boldsymbol{\chi}\right)^{\hat{\mu}}=\frac{\mathrm{d} \chi^{\hat{\mu}}}{\mathrm{d} \tau} \tag{10.71} \end{equation*}(10.71)(Dχ dτ)μ^(uχ)μ^=dχμ^dτ
At various points in the remainder of the book, we will deploy vielbein components in order to efficiently perform certain calculations. In the next chapter, we turn to the long-awaited method of determining the curvature of spacetime.
18 18 ^(18){ }^{18}18 A useful step in seeing this is to write the vielbein components as ( e α ) μ ^ = e α μ ^ = (e_(alpha))^( hat(mu))=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}=(eα)μ^= ω μ ^ , e α ω μ ^ , e α (:omega^( hat(mu)),e_(alpha):)\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\alpha}\right\rangleωμ^,eα, so we have
ω μ ^ , e α ( u χ ) α = ω μ ^ , u χ . ω μ ^ , e α u χ α = ω μ ^ , u χ . (:omega^( hat(mu)),e_(alpha):)(grad_(u)chi)^(alpha)=(:omega^( hat(mu)),grad_(u)chi:).\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\alpha}\right\rangle\left(\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\chi}\right)^{\alpha}=\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\chi}\right\rangle .ωμ^,eα(uχ)α=ωμ^,uχ.
( 10.65 ) ( 10.65 ) (10.65)(10.65)(10.65)
Next, we note that
u ω μ ^ , χ = u ω μ ^ , χ + ω μ ^ , u χ u ω μ ^ , χ = u ω μ ^ , χ + ω μ ^ , u χ grad_(u)(:omega^( hat(mu)),chi:)=(:grad_(u)omega^( hat(mu)),chi:)+(:omega^( hat(mu)),grad_(u)chi:)\boldsymbol{\nabla}_{\boldsymbol{u}}\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\chi}\right\rangle=\left\langle\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\chi}\right\rangle+\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\chi}\right\rangleuωμ^,χ=uωμ^,χ+ωμ^,uχ, but u ω μ ^ = 0 u ω μ ^ = 0 grad_(u)omega^( hat(mu))=0\nabla_{u} \boldsymbol{\omega}^{\hat{\mu}}=0uωμ^=0 by definition of the freely falling frame. So the right-hand side of eqn 10.65 becomes
u ω μ ^ , χ = u χ μ ^ u ω μ ^ , χ = u χ μ ^ grad_(u)(:omega^( hat(mu)),chi:)=grad_(u)chi^( hat(mu))\boldsymbol{\nabla}_{\boldsymbol{u}}\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\chi}\right\rangle=\boldsymbol{\nabla}_{\boldsymbol{u}} \chi^{\hat{\mu}}uωμ^,χ=uχμ^

Chapter summary

  • Measurements in general relativity are carried out by observers who carry around an orthonormal coordinate system with basis vectors e α ^ e α ^ e_( hat(alpha))\boldsymbol{e}_{\hat{\alpha}}eα^. An observer with velocity u u u\boldsymbol{u}u has e 0 ^ = u e 0 ^ = u e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}e0^=u.
  • A vielbein, with components ( e μ ) α ^ e μ α ^ (e_(mu))^( hat(alpha))\left(\boldsymbol{e}_{\mu}\right)^{\hat{\alpha}}(eμ)α^, is a matrix that allows the transformation between orthonormal frames and coordinate frames.
  • A particularly convenient frame is an orthonormal one where the observer is at rest relative to the coordinate frame. An alternative is the freely falling frame (defined by u e α ^ = 0 u e α ^ = 0 grad_(u)e_( hat(alpha))=0\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{e}_{\hat{\alpha}}=0ueα^=0 for all α α alpha\alphaα ) in which the connection coefficients also vanish.
  • In the (stationary) orthonormal frame, the observer's 3-velocity vanishes and so, using u u = g 00 ( u 0 ) = 1 u u = g 00 u 0 = 1 u*u=g_(00)(u^(0))=-1\boldsymbol{u} \cdot \boldsymbol{u}=g_{00}\left(u^{0}\right)=-1uu=g00(u0)=1, we have u 0 = u 0 = u^(0)=u^{0}=u0= ( g 00 ) 1 2 g 00 1 2 (-g_(00))^(-(1)/(2))\left(-g_{00}\right)^{-\frac{1}{2}}(g00)12. The observer orients their orthonormal frame with e 0 ^ = u e 0 ^ = u e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}e0^=u, so e 0 ^ = ( g 00 ) 1 2 e 0 e 0 ^ = g 00 1 2 e 0 e_( hat(0))=(-g_(00))^(-(1)/(2))e_(0)\boldsymbol{e}_{\hat{0}}=\left(-g_{00}\right)^{-\frac{1}{2}} \boldsymbol{e}_{0}e0^=(g00)12e0, or ( e 0 ^ ) 0 = ( g 00 ) 1 2 e 0 ^ 0 = g 00 1 2 (e_( hat(0)))^(0)=(-g_(00))^(-(1)/(2))\left(\boldsymbol{e}_{\hat{0}}\right)^{0}=\left(-g_{00}\right)^{-\frac{1}{2}}(e0^)0=(g00)12.
  • In the orthonormal frame for a diagonal metric, we have ( e 0 ) 0 ^ = e 0 0 ^ = (e_(0))^( hat(0))=\left(e_{0}\right)^{\hat{0}}=(e0)0^= ( g 00 ) 1 2 g 00 1 2 (-g_(00))^((1)/(2))\left(-g_{00}\right)^{\frac{1}{2}}(g00)12 and ( e i ) i ^ = g i i 1 2 e i i ^ = g i i 1 2 (e_(i))^( hat(i))=g_(ii)^((1)/(2))\left(\boldsymbol{e}_{i}\right)^{\hat{i}}=g_{i i}^{\frac{1}{2}}(ei)i^=gii12.

Exercises

(10.1) Consider spacetime with a line element
d s 2 = d t 2 + a ( t ) 2 ( d χ 2 + sinh 2 χ d θ 2 (10.72) + sinh 2 χ sin 2 ϕ d ϕ 2 ) d s 2 = d t 2 + a ( t ) 2 d χ 2 + sinh 2 χ d θ 2 (10.72) + sinh 2 χ sin 2 ϕ d ϕ 2 {:[ds^(2)=-dt^(2)+a(t)^(2)((d)chi^(2)+sinh^(2)chi(d)theta^(2):}],[(10.72){:+sinh^(2)chisin^(2)phi(d)phi^(2))]:}\begin{align*} \mathrm{d} s^{2}= & -\mathrm{d} t^{2}+a(t)^{2}\left(\mathrm{~d} \chi^{2}+\sinh ^{2} \chi \mathrm{~d} \theta^{2}\right. \\ & \left.+\sinh ^{2} \chi \sin ^{2} \phi \mathrm{~d} \phi^{2}\right) \tag{10.72} \end{align*}ds2=dt2+a(t)2( dχ2+sinh2χ dθ2(10.72)+sinh2χsin2ϕ dϕ2)
(a) Using a coordinate system ( t , χ , θ , ϕ ) ( t , χ , θ , ϕ ) (t,chi,theta,phi)(t, \chi, \theta, \phi)(t,χ,θ,ϕ), a vector has components in the coordinate frame if V μ = V μ = V^(mu)=V^{\mu}=Vμ= ( V t , V χ , V θ , V ϕ ) V t , V χ , V θ , V ϕ (V^(t),V^(chi),V^(theta),V^(phi))\left(V^{t}, V^{\chi}, V^{\theta}, V^{\phi}\right)(Vt,Vχ,Vθ,Vϕ). What are the vector's components in the orthonormal frame?
(b) A ( 1 , 2 ) ( 1 , 2 ) (1,2)(1,2)(1,2) tensor has a non-zero component G θ χ ϕ G θ χ ϕ G^(theta)_(chi phi)G^{\theta}{ }_{\chi \phi}Gθχϕ. What does this become in the orthonormal frame?
(10.2) (a) Working in the orthonormal frame, find the connection coefficients for flat space represented in cylindrical polar coordinates.
Hint: Remember that the connection coefficients do not transform like tensors, so you cannot simply use the vielbein. You can compute the coefficients directly from the definitions of the basis vectors, or transform using eqn 7.11.
(b) Show that the connection coefficients you have derived obey the rule Γ α ^ β ^ μ Γ μ β ^ α ^ = Γ α ^ β ^ μ Γ μ β ^ α ^ = Gamma_( hat(alpha) hat(beta))^(mu)-Gamma^(mu)_( hat(beta) hat(alpha))=\Gamma_{\hat{\alpha} \hat{\beta}}^{\mu}-\Gamma^{\mu}{ }_{\hat{\beta} \hat{\alpha}}=Γα^β^μΓμβ^α^= ω μ ^ , [ e α ^ , e β ^ ] ω μ ^ , e α ^ , e β ^ (:omega^( hat(mu)),[e_( hat(alpha)),e_( hat(beta))]:)\left\langle\boldsymbol{\omega}^{\hat{\mu}},\left[\boldsymbol{e}_{\hat{\alpha}}, \boldsymbol{e}_{\hat{\beta}}\right]\right\rangleωμ^,[eα^,eβ^].
(10.3) This problem combines several ideas from the last few chapters and is a useful warm up for some of the physics in Part IV of the book.
A particle travels radially in a static, spherically symmetric gravitational field described by diagonal metric components g μ ν g μ ν g_(mu nu)g_{\mu \nu}gμν and a velocity vector u u u\boldsymbol{u}u.
(a) If the timelike component of the particle's velocity 1 -form is given in the static frame of the potential by a constant u t = a u t = a u_(t)=au_{t}=aut=a, give the other components in terms of a a aaa and the components of the metric.
(b) Compute the coordinate velocity d r / d t d r / d t dr//dt\mathrm{d} r / \mathrm{d} tdr/dt.
(c) What is the coordinate velocity, as measured by a local observer?
(10.4) Consider flat spacetime expressed in an orthogonal coordinate system ( x 1 , x 2 , x 3 ) x 1 , x 2 , x 3 (x^(1),x^(2),x^(3))\left(x^{1}, x^{2}, x^{3}\right)(x1,x2,x3) with a diagonal metric.
(a) Show that the gradient operator acting on a function f f fff becomes
f x μ ~ = ( 1 ( g 11 ) 1 2 f x 1 , 1 ( g 22 ) 1 2 f x 2 , 1 ( g 33 ) 1 2 f x 3 ) f x μ ~ = 1 g 11 1 2 f x 1 , 1 g 22 1 2 f x 2 , 1 g 33 1 2 f x 3 (del f)/(delx^( tilde(mu)))=((1)/((g_(11))^((1)/(2)))(del f)/(delx^(1)),(1)/((g_(22))^((1)/(2)))(del f)/(delx^(2)),(1)/((g_(33))^((1)/(2)))*(del f)/(delx^(3)))\frac{\partial f}{\partial x^{\tilde{\mu}}}=\left(\frac{1}{\left(g_{11}\right)^{\frac{1}{2}}} \frac{\partial f}{\partial x^{1}}, \frac{1}{\left(g_{22}\right)^{\frac{1}{2}}} \frac{\partial f}{\partial x^{2}}, \frac{1}{\left(g_{33}\right)^{\frac{1}{2}}} \cdot \frac{\partial f}{\partial x^{3}}\right)fxμ~=(1(g11)12fx1,1(g22)12fx2,1(g33)12fx3)
(b) Prove that, in the coordinate frame with this diagonal metric, the divergence can be rewritten as
( μ v ) μ = v μ x μ + 1 g : g x μ v μ . μ v μ = v μ x μ + 1 g : g x μ v μ . (grad_(mu)v)^(mu)=(delv^(mu))/(delx^(mu))+(1)/(sqrtg):(delsqrtg)/(delx^(mu))v^(mu).\left(\boldsymbol{\nabla}_{\mu} \boldsymbol{v}\right)^{\mu}=\frac{\partial v^{\mu}}{\partial x^{\mu}}+\frac{1}{\sqrt{g}}: \frac{\partial \sqrt{g}}{\partial x^{\mu}} v^{\mu} .(μv)μ=vμxμ+1g:gxμvμ.
(10.74)
(c) Use the result from (b) to show that, expressed in terms of the components in the orthonormal frame, the divergence can be written as
v = 1 g [ x 1 ( g 22 g 33 v 1 ^ ) + x 2 ( g 33 g 11 v 2 ^ ) (10.75) + x 3 ( g 11 g 22 v 3 ^ ) ] . v = 1 g x 1 g 22 g 33 v 1 ^ + x 2 g 33 g 11 v 2 ^ (10.75) + x 3 g 11 g 22 v 3 ^ . {:[grad*v=(1)/(sqrtg)[(del)/(delx^(1))(sqrt(g_(22)g_(33))v^( hat(1))):}],[+(del)/(delx^(2))(sqrt(g_(33)g_(11))v^( hat(2)))],[(10.75){:+(del)/(delx^(3))(sqrt(g_(11)g_(22))v^( hat(3)))].]:}\begin{align*} \boldsymbol{\nabla} \cdot \boldsymbol{v}= & \frac{1}{\sqrt{g}}\left[\frac{\partial}{\partial x^{1}}\left(\sqrt{g_{22} g_{33}} v^{\hat{1}}\right)\right. \\ & +\frac{\partial}{\partial x^{2}}\left(\sqrt{g_{33} g_{11}} v^{\hat{2}}\right) \\ & \left.+\frac{\partial}{\partial x^{3}}\left(\sqrt{g_{11} g_{22}} v^{\hat{3}}\right)\right] . \tag{10.75} \end{align*}v=1g[x1(g22g33v1^)+x2(g33g11v2^)(10.75)+x3(g11g22v3^)].
What does this formula yield for (d) orthonormal cylindrical polar coordinates and (e) orthonormal spherical polar coordinates?
(10.5) A light signal is emitted by a source on the rim of a centrifuge and detected by a detector at another point on the rim, separated by an angle α α alpha\alphaα. Use the metric for the rotating frame
d s 2 = ( 1 Ω 2 r 2 ) d t 2 + d r 2 + r 2 d θ 2 (10.76) + d z 2 + 2 Ω r 2 d θ d t d s 2 = 1 Ω 2 r 2 d t 2 + d r 2 + r 2 d θ 2 (10.76) + d z 2 + 2 Ω r 2 d θ d t {:[ds^(2)=-(1-Omega^(2)r^(2))dt^(2)+dr^(2)+r^(2)dtheta^(2)],[(10.76)+dz^(2)+2Omegar^(2)dthetadt]:}\begin{align*} \mathrm{d} s^{2}= & -\left(1-\Omega^{2} r^{2}\right) \mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2} \\ & +\mathrm{d} z^{2}+2 \Omega r^{2} \mathrm{~d} \theta \mathrm{~d} t \tag{10.76} \end{align*}ds2=(1Ω2r2)dt2+dr2+r2 dθ2(10.76)+dz2+2Ωr2 dθ dt
to show that there is no shift in the frequency of the signal.
.6) An alternative to orthonormal local basis vectors was suggested by Newman and Penrose. They considered a pair of real null vectors l l lll and n n nnn and a pair of complex-conjugate null vectors m m m\boldsymbol{m}m and m m ¯ bar(m)\overline{\boldsymbol{m}}m obeying
(10.77) l m = l m ¯ = n m = n m ¯ = 0 (10.77) l m = l m ¯ = n m = n m ¯ = 0 {:(10.77)l*m=l* bar(m)=n*m=n* bar(m)=0:}\begin{equation*} l \cdot m=l \cdot \bar{m}=n \cdot m=n \cdot \bar{m}=0 \tag{10.77} \end{equation*}(10.77)lm=lm¯=nm=nm¯=0
The vectors are normalized according to l n = 1 l n = 1 l*n=1\boldsymbol{l} \cdot \boldsymbol{n}=1ln=1 and m m = 1 m m ¯ = 1 m* bar(m)=-1\boldsymbol{m} \cdot \overline{\boldsymbol{m}}=-1mm=1.
(a) If we take the local basis to be
(10.78) e 1 ^ = l , e 2 ^ = n e 3 ^ = m , e 4 ^ = m ¯ (10.78) e 1 ^ = l , e 2 ^ = n e 3 ^ = m , e 4 ^ = m ¯ {:(10.78){:[e_( hat(1))=l",",e_( hat(2))=n],[e_( hat(3))=m",",e_( hat(4))= bar(m)]:}:}\begin{array}{cc} \boldsymbol{e}_{\hat{1}}=\boldsymbol{l}, & \boldsymbol{e}_{\hat{2}}=\boldsymbol{n} \tag{10.78}\\ \boldsymbol{e}_{\hat{3}}=\boldsymbol{m}, & \boldsymbol{e}_{\hat{4}}=\bar{m} \end{array}(10.78)e1^=l,e2^=ne3^=m,e4^=m¯
find the components of the local metric η μ ^ ν ^ η μ ^ ν ^ eta_( hat(mu) hat(nu))\eta_{\hat{\mu} \hat{\nu}}ημ^ν^.
(b) Find the local basis 1 -forms ω μ ^ ω μ ^ omega^( hat(mu))\boldsymbol{\omega}^{\hat{\mu}}ωμ^, assuming the usual relationship ω μ ^ , e ν ^ = δ ν ^ ν ^ ω μ ^ , e ν ^ = δ ν ^ ν ^ (:omega^( hat(mu)),e_( hat(nu)):)=delta_( hat(nu))^( hat(nu))\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\hat{\nu}}\right\rangle=\delta_{\hat{\nu}}^{\hat{\nu}}ωμ^,eν^=δν^ν^.
(10.7) Suggest vielbein components for a (1+1)dimensional metric with line element d s 2 = d u d v d s 2 = d u d v ds^(2)=-dudv\mathrm{d} s^{2}=-\mathrm{d} u \mathrm{~d} vds2=du dv.

  1. 1 1 ^(1){ }^{1}1 A freely falling body is one that experiences only the effects of gravity.
    2 2 ^(2){ }^{2}2 This is sometimes called the principle of weak equivalence, but we will take the view that the weakness is an take the view that the weakness is an
    attribute of the principle. The corresponding strong principle of equivalence will be introduced on the following page. Why is this principle weak? Because it only applies to mechanica forces.
  2. This is illustrated in Fig. 7.6.